Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a readv benchmark and fix sketch buffer GC #64

Merged
merged 3 commits into from
Aug 18, 2022

Conversation

talex5
Copy link
Collaborator

@talex5 talex5 commented Aug 17, 2022

The benchmark performs 16 concurrent reads of /dev/zero in a loop. Each time a read completes, it issues a new one.

This benchmark unconvered some problems in the current sketch buffer handling:

  • A typo in the C stubs meant that the sketch buffer was never released.

  • Even with that fixed it might still never be freed, as the application is not required to call submit explicitly. Instead, we now try to release the buffer when waiting or peeking.

  • I set the initial sketch buffer size to 4096. Before, it was zero, which lead to quite a lot of allocation as it grows rather slowly. I'm not sure if this really matters, however.

On my machine, I get:

Read 95.37 MB in 0.25 seconds (388.92 MB/second) # buffer_size=100, polling=false
Read 95.37 MB in 0.47 seconds (203.45 MB/second) # buffer_size=100, polling=true
Read 95.37 MB in 0.24 seconds (389.73 MB/second) # buffer_size=100, polling=false
Read 95.37 MB in 0.48 seconds (200.75 MB/second) # buffer_size=100, polling=true

With a larger buffer:

Read 124998.12 MB in 5.11 seconds (24452.64 MB/second) # buffer_size=131072, polling=false
Read 124999.76 MB in 5.14 seconds (24318.08 MB/second) # buffer_size=131072, polling=true
Read 124993.41 MB in 4.95 seconds (25236.34 MB/second) # buffer_size=131072, polling=false
Read 124999.46 MB in 5.17 seconds (24157.28 MB/second) # buffer_size=131072, polling=true

I'm not sure why polling mode is slower. Possibly because reading from /dev/zero doesn't benefit from happing asynchronously.

@talex5 talex5 requested a review from haesbaert August 17, 2022 10:32
lib/uring/uring_stubs.c Show resolved Hide resolved
if Uring.sq_ready t.uring = 0 then
Sketch.release t.sketch;
v
if t.dirty then begin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rationale for checking sq_ready even if dirty was not set, is that technically you can still allocate things and never have dirty set, also more as of a safe guard in the future, checking sq_ready is cheap enough.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've put it back how it was before.

lib/uring/uring.ml Outdated Show resolved Hide resolved
bench/readv.ml Show resolved Hide resolved
bench/readv.ml Show resolved Hide resolved
bench/readv.ml Show resolved Hide resolved
lib/uring/uring.ml Outdated Show resolved Hide resolved
talex5 added 3 commits August 18, 2022 10:03
The benchmark performs 16 concurrent reads of /dev/zero in a loop. Each
time a read completes, it issues a new one.

This benchmark unconvered some problems in the current sketch buffer handling:

- A typo in the C stubs meant that the sketch buffer was never released.

- Even with that fixed it might still never be freed, as the application
  is not required to call submit explicitly. Instead, we now try to
  release the buffer when waiting or peeking.

- I set the initial sketch buffer size to 4096. Before, it was zero,
  which lead to quite a lot of allocation as it grows rather slowly.
  I'm not sure if this really matters, however.

On my machine, I get:

    Read 95.37 MB in 0.25 seconds (388.92 MB/second) # buffer_size=100, polling=false
    Read 95.37 MB in 0.47 seconds (203.45 MB/second) # buffer_size=100, polling=true
    Read 95.37 MB in 0.24 seconds (389.73 MB/second) # buffer_size=100, polling=false
    Read 95.37 MB in 0.48 seconds (200.75 MB/second) # buffer_size=100, polling=true

With a larger buffer:

    Read 124998.12 MB in 5.11 seconds (24452.64 MB/second) # buffer_size=131072, polling=false
    Read 124999.76 MB in 5.14 seconds (24318.08 MB/second) # buffer_size=131072, polling=true
    Read 124993.41 MB in 4.95 seconds (25236.34 MB/second) # buffer_size=131072, polling=false
    Read 124999.46 MB in 5.17 seconds (24157.28 MB/second) # buffer_size=131072, polling=true

I'm not sure why polling mode is slower. Possibly because reading from
/dev/zero doesn't benefit from happing asynchronously.
This should make it easier to check that the uring is behaving as
expected.
@haesbaert
Copy link
Contributor

Awesome ! thanks for the fixes \o/

@haesbaert haesbaert merged commit af3811c into ocaml-multicore:main Aug 18, 2022
talex5 added a commit to talex5/opam-repository that referenced this pull request Aug 26, 2022
CHANGES:

New features:

- Add `Uring.timeout` (@bikallem ocaml-multicore/ocaml-uring#59).

- Add `Uring.read` and `Uring.write` (@haesbaert ocaml-multicore/ocaml-uring#62).
  These are simple wrappers for read(2) and write(2).

- Add `Uring.unlink` (@talex5 ocaml-multicore/ocaml-uring#65).
  This uses unlinkat(2), and so can also be used to remove directories.

- Add support for uring probes (@talex5 ocaml-multicore/ocaml-uring#70).
  Allows checking whether a feature is supported by the kernel at runtime.

- Rename `peek` to `get_cqe_nonblocking` (@talex5 ocaml-multicore/ocaml-uring#67).
  The old name was confusing because it does remove the item from the ring.

- Update to liburing 2.2 (@talex5 ocaml-multicore/ocaml-uring#56).

- Add `Uring.active_ops` (@talex5 ocaml-multicore/ocaml-uring#68).
  Avoids needing to track the value returned by `submit`, which is important as it is sometimes called automatically.

- Add `Uring.iov_max` constant (@talex5 ocaml-multicore/ocaml-uring#76).

- Add `Uring.get_debug_stats` (@talex5 ocaml-multicore/ocaml-uring#64).
  This should make it easier to check that the uring is behaving as expected.

Performance:

- Introduce a Sketch buffer per Uring (@haesbaert ocaml-multicore/ocaml-uring#63).
  The main motivation of this change is to avoid having one malloc per packet in readv(2), writev(2) and friends.

- Use `submit_and_wait` where appropriate (@haesbaert ocaml-multicore/ocaml-uring#69).

- Add a `readv` benchmark (@talex5 ocaml-multicore/ocaml-uring#64).

- Avoid unnecessary use of `CAMLparam` in the C stubs (@haesbaert ocaml-multicore/ocaml-uring#61).

Bug fixes:

- Prevent ring from being used after exit (@talex5 ocaml-multicore/ocaml-uring#78).

Build changes:

- Remove use of notty for formatting benchmark results (@talex5 ocaml-multicore/ocaml-uring#71).
  It prevented uring from being tested on OCaml 5.

- Use MDX for README (@talex5 ocaml-multicore/ocaml-uring#57).

- Convert tests to MDX (@talex5 ocaml-multicore/ocaml-uring#58 ocaml-multicore/ocaml-uring#73).

- Use opam-repository syntax for license (@kit-ty-kate ocaml-multicore/ocaml-uring#72).

- Remove internal `is_dirty` flag (@talex5 ocaml-multicore/ocaml-uring#77).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants