-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a readv benchmark and fix sketch buffer GC #64
Conversation
lib/uring/uring.ml
Outdated
if Uring.sq_ready t.uring = 0 then | ||
Sketch.release t.sketch; | ||
v | ||
if t.dirty then begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rationale for checking sq_ready
even if dirty
was not set, is that technically you can still allocate things and never have dirty
set, also more as of a safe guard in the future, checking sq_ready
is cheap enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I've put it back how it was before.
The benchmark performs 16 concurrent reads of /dev/zero in a loop. Each time a read completes, it issues a new one. This benchmark unconvered some problems in the current sketch buffer handling: - A typo in the C stubs meant that the sketch buffer was never released. - Even with that fixed it might still never be freed, as the application is not required to call submit explicitly. Instead, we now try to release the buffer when waiting or peeking. - I set the initial sketch buffer size to 4096. Before, it was zero, which lead to quite a lot of allocation as it grows rather slowly. I'm not sure if this really matters, however. On my machine, I get: Read 95.37 MB in 0.25 seconds (388.92 MB/second) # buffer_size=100, polling=false Read 95.37 MB in 0.47 seconds (203.45 MB/second) # buffer_size=100, polling=true Read 95.37 MB in 0.24 seconds (389.73 MB/second) # buffer_size=100, polling=false Read 95.37 MB in 0.48 seconds (200.75 MB/second) # buffer_size=100, polling=true With a larger buffer: Read 124998.12 MB in 5.11 seconds (24452.64 MB/second) # buffer_size=131072, polling=false Read 124999.76 MB in 5.14 seconds (24318.08 MB/second) # buffer_size=131072, polling=true Read 124993.41 MB in 4.95 seconds (25236.34 MB/second) # buffer_size=131072, polling=false Read 124999.46 MB in 5.17 seconds (24157.28 MB/second) # buffer_size=131072, polling=true I'm not sure why polling mode is slower. Possibly because reading from /dev/zero doesn't benefit from happing asynchronously.
This should make it easier to check that the uring is behaving as expected.
Awesome ! thanks for the fixes \o/ |
CHANGES: New features: - Add `Uring.timeout` (@bikallem ocaml-multicore/ocaml-uring#59). - Add `Uring.read` and `Uring.write` (@haesbaert ocaml-multicore/ocaml-uring#62). These are simple wrappers for read(2) and write(2). - Add `Uring.unlink` (@talex5 ocaml-multicore/ocaml-uring#65). This uses unlinkat(2), and so can also be used to remove directories. - Add support for uring probes (@talex5 ocaml-multicore/ocaml-uring#70). Allows checking whether a feature is supported by the kernel at runtime. - Rename `peek` to `get_cqe_nonblocking` (@talex5 ocaml-multicore/ocaml-uring#67). The old name was confusing because it does remove the item from the ring. - Update to liburing 2.2 (@talex5 ocaml-multicore/ocaml-uring#56). - Add `Uring.active_ops` (@talex5 ocaml-multicore/ocaml-uring#68). Avoids needing to track the value returned by `submit`, which is important as it is sometimes called automatically. - Add `Uring.iov_max` constant (@talex5 ocaml-multicore/ocaml-uring#76). - Add `Uring.get_debug_stats` (@talex5 ocaml-multicore/ocaml-uring#64). This should make it easier to check that the uring is behaving as expected. Performance: - Introduce a Sketch buffer per Uring (@haesbaert ocaml-multicore/ocaml-uring#63). The main motivation of this change is to avoid having one malloc per packet in readv(2), writev(2) and friends. - Use `submit_and_wait` where appropriate (@haesbaert ocaml-multicore/ocaml-uring#69). - Add a `readv` benchmark (@talex5 ocaml-multicore/ocaml-uring#64). - Avoid unnecessary use of `CAMLparam` in the C stubs (@haesbaert ocaml-multicore/ocaml-uring#61). Bug fixes: - Prevent ring from being used after exit (@talex5 ocaml-multicore/ocaml-uring#78). Build changes: - Remove use of notty for formatting benchmark results (@talex5 ocaml-multicore/ocaml-uring#71). It prevented uring from being tested on OCaml 5. - Use MDX for README (@talex5 ocaml-multicore/ocaml-uring#57). - Convert tests to MDX (@talex5 ocaml-multicore/ocaml-uring#58 ocaml-multicore/ocaml-uring#73). - Use opam-repository syntax for license (@kit-ty-kate ocaml-multicore/ocaml-uring#72). - Remove internal `is_dirty` flag (@talex5 ocaml-multicore/ocaml-uring#77).
The benchmark performs 16 concurrent reads of /dev/zero in a loop. Each time a read completes, it issues a new one.
This benchmark unconvered some problems in the current sketch buffer handling:
A typo in the C stubs meant that the sketch buffer was never released.
Even with that fixed it might still never be freed, as the application is not required to call submit explicitly. Instead, we now try to release the buffer when waiting or peeking.
I set the initial sketch buffer size to 4096. Before, it was zero, which lead to quite a lot of allocation as it grows rather slowly. I'm not sure if this really matters, however.
On my machine, I get:
With a larger buffer:
I'm not sure why polling mode is slower. Possibly because reading from /dev/zero doesn't benefit from happing asynchronously.