Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safer handling of aligned buffers? #38

Open
djs55 opened this issue Dec 14, 2014 · 6 comments
Open

Safer handling of aligned buffers? #38

djs55 opened this issue Dec 14, 2014 · 6 comments

Comments

@djs55
Copy link
Member

djs55 commented Dec 14, 2014

In Mirage's low-level I/O code we often require buffers to be aligned e.g.

  • page-aligned for a Xen inter-domain ring
  • sector-aligned for a Linux O_DIRECT read/write

At the moment we use the [mirage/io-page] library to allocate page-aligned bigarrays (which being page-aligned satisfy both requirements above). Other libraries accept ad-hoc mixtures of Io_page.t and Cstruct.t which leads to inefficiency [mirage/mirage-platform#110], bugs and general confusion. We tend to either

  • use a Io_page.t to spot bugs where unaligned buffers are accidentally produced; but then we end up calling functions like Io_page.to_pages which calls Bigarray.Array1.sub which is much slower than a plain Cstruct.sub; or
  • use a Cstruct.t for convenience but then we lose track of our alignment and accidentally submit an badly-aligned buffer and screw up an I/O. Sometimes we remember to check the alignment and stop with a fatal error, other times we forget and then the wrong buffer fragment is used (!)

Can we make Cstruct.t a bit more alignment-aware? Perhaps if we had a phantom type (see #14) we could make our low-level functions accept something like a [> Page_aligned ] Cstruct.t` and perhaps functions like

  • to_pages: [> Page_aligned ] Cstruct.t -> [> Page_aligned ] Cstruct.t list
  • align: 'a Cstruct.t -> [ Page_aligned ] Cstruct.t(where the returned buffer would beCstruct.sub`ed from the original to make it aligned)
@talex5
Copy link
Contributor

talex5 commented Jan 27, 2015

Why is Array1.sub much slower than Cstruct.sub? According to the docs:

No copying of elements is involved: the sub-array and the original array share the same storage space.

http://caml.inria.fr/pub/docs/manual-ocaml/libref/Bigarray.Genarray.html (sub_left)

@avsm
Copy link
Member

avsm commented Jan 27, 2015

Array1.sub drops to C code in Bigarray which increments a reference count on an underlying proxy value. The Cstruct.sub is just a fast OCaml allocation that lives in the minor heap. There's no copying of data, but the actual Array1.sub is slower than a Cstruct.sub (benchmark welcome to confirm this in modern OCaml!)

@pqwy
Copy link
Contributor

pqwy commented Jan 30, 2015

I was also very curious about this claim, as I believed Array1.sub couldn't be that much slower than Cstruct.sub.

let () = Random.self_init ()

let time ~tag f =
  let t1 = Sys.time () in
  let r  = f () in
  let t2 = Sys.time () in
  Printf.printf "[time] %s %.04f sec\n%!" tag (t2 -. t1) ;
  r

let size = 100000
and n    = 10000000

let rix () = Random.int size

let () =
  time ~tag:"rng" @@ fun () ->
    for i = 1 to n do ignore @@ rix() done

let () =
  let open Bigarray in
  let arr = Array1.create char c_layout size in
  time ~tag:"bigarray" @@ fun () ->
    for i = 1 to n do ignore @@ Array1.sub arr (rix()) 1 done

let () =
  let cs = Cstruct.create size in
  time ~tag:"cstruct" @@ fun () ->
    for i = 1 to n do ignore @@ Cstruct.sub cs (rix()) 1 done

Taking the average of 10 runs and subtracting rng time from the running time:

Array1.sub : 0.680 sec
Cstruct.sub : 0.047 sec

Replacing the random index with 0:

Array1.sub : 0.723 sec
Cstruct.sub : 0.028 sec

It is, apparently. It's around 15-30 times slower.

I suspect this might have something to do with new array wrappers never being allocated on the minor heap, as they have a finalizer (as per caml_alloc_custom), and with proxy creation and all the checking going on.

But lesson learned. Will touch Array1.sub with care.

@avsm
Copy link
Member

avsm commented Jan 30, 2015

thanks for confirming!

On 30 Jan 2015, at 00:11, David Kaloper notifications@github.com wrote:

I was also very curious about this claim, as I believed Array1.sub couldn't be that much slower than Cstruct.sub.

let () = Random.self_init ()

let time ~tag f =
let t1 = Sys.time () in
let r = f () in
let t2 = Sys.time () in
Printf.printf "[time] %s %.04f sec\n%!" tag (t2 -. t1) ;
r

let size = 100000
and n = 10000000

let rix () = Random.int size

let () =
time ~tag:"rng" @@ fun () ->
for i = 1 to n do ignore @@ rix() done

let () =
let open Bigarray in
let arr = Array1.create char c_layout size in
time ~tag:"bigarray" @@ fun () ->
for i = 1 to n do ignore @@ Array1.sub arr (rix()) 1 done

let () =
let cs = Cstruct.create size in
time ~tag:"cstruct" @@ fun () ->
for i = 1 to n do ignore @@ Cstruct.sub cs (rix()) 1 done
Taking the average of 10 runs and subtracting rng time from the running time:

Array1.sub : 0.680 sec
Cstruct.sub : 0.047 sec
Replacing the random index with 0:

Array1.sub : 0.723 sec
Cstruct.sub : 0.028 sec
It is, apparently. It's around 15-30 times slower.

I suspect this might have something to do with new array wrappers never being allocated on the minor heap, as they have a finalizer (as per caml_alloc_custom), and with proxy creation and all the checking going on.

But lesson learned. Will touch Array1.sub with care.


Reply to this email directly or view it on GitHub #38 (comment).

@djs55
Copy link
Member Author

djs55 commented Mar 12, 2016

An alignment improvement is in #87: namely an assert-style function check_alignment which can be used to at least generate a nice run-time error (previously a typical run-time error would be something terrible like EINVAL)

@DemiMarie
Copy link

OCaml (either trunk or 4.03, can't remember which) can allocate Custom blocks with finalizers in the minor heap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants