Add a readv benchmark and fix sketch buffer GC #64

talex5 · 2022-08-17T10:32:20Z

The benchmark performs 16 concurrent reads of /dev/zero in a loop. Each time a read completes, it issues a new one.

This benchmark unconvered some problems in the current sketch buffer handling:

A typo in the C stubs meant that the sketch buffer was never released.
Even with that fixed it might still never be freed, as the application is not required to call submit explicitly. Instead, we now try to release the buffer when waiting or peeking.
I set the initial sketch buffer size to 4096. Before, it was zero, which lead to quite a lot of allocation as it grows rather slowly. I'm not sure if this really matters, however.

On my machine, I get:

Read 95.37 MB in 0.25 seconds (388.92 MB/second) # buffer_size=100, polling=false
Read 95.37 MB in 0.47 seconds (203.45 MB/second) # buffer_size=100, polling=true
Read 95.37 MB in 0.24 seconds (389.73 MB/second) # buffer_size=100, polling=false
Read 95.37 MB in 0.48 seconds (200.75 MB/second) # buffer_size=100, polling=true

With a larger buffer:

Read 124998.12 MB in 5.11 seconds (24452.64 MB/second) # buffer_size=131072, polling=false
Read 124999.76 MB in 5.14 seconds (24318.08 MB/second) # buffer_size=131072, polling=true
Read 124993.41 MB in 4.95 seconds (25236.34 MB/second) # buffer_size=131072, polling=false
Read 124999.46 MB in 5.17 seconds (24157.28 MB/second) # buffer_size=131072, polling=true

I'm not sure why polling mode is slower. Possibly because reading from /dev/zero doesn't benefit from happing asynchronously.

lib/uring/uring_stubs.c

haesbaert · 2022-08-17T10:50:23Z

lib/uring/uring.ml

-  if Uring.sq_ready t.uring = 0 then
-    Sketch.release t.sketch;
-  v
+  if t.dirty then begin


The rationale for checking sq_ready even if dirty was not set, is that technically you can still allocate things and never have dirty set, also more as of a safe guard in the future, checking sq_ready is cheap enough.

OK, I've put it back how it was before.

lib/uring/uring.ml

bench/readv.ml

lib/uring/uring.ml

The benchmark performs 16 concurrent reads of /dev/zero in a loop. Each time a read completes, it issues a new one. This benchmark unconvered some problems in the current sketch buffer handling: - A typo in the C stubs meant that the sketch buffer was never released. - Even with that fixed it might still never be freed, as the application is not required to call submit explicitly. Instead, we now try to release the buffer when waiting or peeking. - I set the initial sketch buffer size to 4096. Before, it was zero, which lead to quite a lot of allocation as it grows rather slowly. I'm not sure if this really matters, however. On my machine, I get: Read 95.37 MB in 0.25 seconds (388.92 MB/second) # buffer_size=100, polling=false Read 95.37 MB in 0.47 seconds (203.45 MB/second) # buffer_size=100, polling=true Read 95.37 MB in 0.24 seconds (389.73 MB/second) # buffer_size=100, polling=false Read 95.37 MB in 0.48 seconds (200.75 MB/second) # buffer_size=100, polling=true With a larger buffer: Read 124998.12 MB in 5.11 seconds (24452.64 MB/second) # buffer_size=131072, polling=false Read 124999.76 MB in 5.14 seconds (24318.08 MB/second) # buffer_size=131072, polling=true Read 124993.41 MB in 4.95 seconds (25236.34 MB/second) # buffer_size=131072, polling=false Read 124999.46 MB in 5.17 seconds (24157.28 MB/second) # buffer_size=131072, polling=true I'm not sure why polling mode is slower. Possibly because reading from /dev/zero doesn't benefit from happing asynchronously.

This should make it easier to check that the uring is behaving as expected.

haesbaert · 2022-08-18T09:11:51Z

Awesome ! thanks for the fixes \o/

@bikallem

CHANGES: New features: - Add `Uring.timeout` (@bikallem ocaml-multicore/ocaml-uring#59). - Add `Uring.read` and `Uring.write` (@haesbaert ocaml-multicore/ocaml-uring#62). These are simple wrappers for read(2) and write(2). - Add `Uring.unlink` (@talex5 ocaml-multicore/ocaml-uring#65). This uses unlinkat(2), and so can also be used to remove directories. - Add support for uring probes (@talex5 ocaml-multicore/ocaml-uring#70). Allows checking whether a feature is supported by the kernel at runtime. - Rename `peek` to `get_cqe_nonblocking` (@talex5 ocaml-multicore/ocaml-uring#67). The old name was confusing because it does remove the item from the ring. - Update to liburing 2.2 (@talex5 ocaml-multicore/ocaml-uring#56). - Add `Uring.active_ops` (@talex5 ocaml-multicore/ocaml-uring#68). Avoids needing to track the value returned by `submit`, which is important as it is sometimes called automatically. - Add `Uring.iov_max` constant (@talex5 ocaml-multicore/ocaml-uring#76). - Add `Uring.get_debug_stats` (@talex5 ocaml-multicore/ocaml-uring#64). This should make it easier to check that the uring is behaving as expected. Performance: - Introduce a Sketch buffer per Uring (@haesbaert ocaml-multicore/ocaml-uring#63). The main motivation of this change is to avoid having one malloc per packet in readv(2), writev(2) and friends. - Use `submit_and_wait` where appropriate (@haesbaert ocaml-multicore/ocaml-uring#69). - Add a `readv` benchmark (@talex5 ocaml-multicore/ocaml-uring#64). - Avoid unnecessary use of `CAMLparam` in the C stubs (@haesbaert ocaml-multicore/ocaml-uring#61). Bug fixes: - Prevent ring from being used after exit (@talex5 ocaml-multicore/ocaml-uring#78). Build changes: - Remove use of notty for formatting benchmark results (@talex5 ocaml-multicore/ocaml-uring#71). It prevented uring from being tested on OCaml 5. - Use MDX for README (@talex5 ocaml-multicore/ocaml-uring#57). - Convert tests to MDX (@talex5 ocaml-multicore/ocaml-uring#58 ocaml-multicore/ocaml-uring#73). - Use opam-repository syntax for license (@kit-ty-kate ocaml-multicore/ocaml-uring#72). - Remove internal `is_dirty` flag (@talex5 ocaml-multicore/ocaml-uring#77).

talex5 requested a review from haesbaert August 17, 2022 10:32

haesbaert requested changes Aug 17, 2022

View reviewed changes

talex5 force-pushed the fix-sketch branch from 00f5c8d to f717ca4 Compare August 18, 2022 08:59

talex5 added 3 commits August 18, 2022 10:03

Add get_debug_stats

df1b777

This should make it easier to check that the uring is behaving as expected.

Start the sketch buffer empty, but grow it faster

b7f104b

talex5 force-pushed the fix-sketch branch from f717ca4 to b7f104b Compare August 18, 2022 09:04

haesbaert approved these changes Aug 18, 2022

View reviewed changes

haesbaert merged commit af3811c into ocaml-multicore:main Aug 18, 2022

talex5 mentioned this pull request Aug 26, 2022

[new release] uring (0.4) ocaml/opam-repository#22028

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a readv benchmark and fix sketch buffer GC #64

Add a readv benchmark and fix sketch buffer GC #64

talex5 commented Aug 17, 2022

haesbaert Aug 17, 2022

talex5 Aug 18, 2022

haesbaert commented Aug 18, 2022

Add a readv benchmark and fix sketch buffer GC #64

Add a readv benchmark and fix sketch buffer GC #64

Conversation

talex5 commented Aug 17, 2022

haesbaert Aug 17, 2022

Choose a reason for hiding this comment

talex5 Aug 18, 2022

Choose a reason for hiding this comment

haesbaert commented Aug 18, 2022