Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep pool of systhreads for blocking operations (simplified) #681

Merged
merged 2 commits into from
Feb 11, 2024

Conversation

talex5
Copy link
Collaborator

@talex5 talex5 commented Feb 3, 2024

This is a simplified version of @SGrondin's #658:

  • Add Run_in_systhread effect. This makes the thread-pool part of the
    scheduler's state, avoiding the need for DLS.

  • Remove max_standby_systhreads_per_domain. Suggested by @polytypic. All threads are returned to the free-list.

  • Simplify termination. Use one atomic instead of two, by using a custom list type.

  • Use the scheduler's timer to drop idle threads. This avoids the need to add timeouts to semaphores, which would add a dependency on Add pthread_cond_timedwait to Condition and Semaphore ocaml/ocaml#12867. I had to modify Zzz slightly to allow non-fiber timeouts.

The cleanup logic is that when a fiber running a task resumes, it adds a 20ms timer to the scheduler (if we don't already have one). When it fires, it tells all idle threads to exit. More complex schemes are possible, but having to recreate a few threads every 20ms shouldn't matter.

@talex5 talex5 force-pushed the pool-systhreads branch 5 times, most recently from 98de8cb to cfc2ac1 Compare February 5, 2024 09:59
@talex5 talex5 marked this pull request as ready for review February 5, 2024 09:59
@talex5 talex5 force-pushed the pool-systhreads branch 2 times, most recently from cfb049d to ad183c5 Compare February 9, 2024 11:36
SGrondin and others added 2 commits February 9, 2024 11:36
- Add `Run_in_systhread` effect. This makes the thread-pool part of the
  scheduler's state, avoiding the need for DLS.

- Remove `max_standby_systhreads_per_domain`. Suggested by Vesa Karvonen.

- Simplify termination. Use one atomic instead of two.

- Use the scheduler's timer to drop idle threads.
  Modified Zzz to allow non-fiber timeouts.
Copy link
Collaborator

@SGrondin SGrondin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any issues with it and I think it's a solid first pass. I have not tested it yet however, unlike #658 against which I developed for weeks, so I'd hold off on releasing it right away. There are also some optimizations I'd like to make and benchmark once it's merged.

@talex5 talex5 merged commit ca4767b into ocaml-multicore:main Feb 11, 2024
5 checks passed
@talex5 talex5 deleted the pool-systhreads branch February 11, 2024 10:04
@talex5
Copy link
Collaborator Author

talex5 commented Feb 11, 2024

Thanks - I'm very interested in the results of the testing! And especially to find out if you still see the latency problem with this version.

talex5 added a commit to talex5/opam-repository that referenced this pull request Feb 22, 2024
CHANGES:

New features:

- eio_posix: use directory FDs instead of realpath (@talex5 ocaml-multicore/eio#694 ocaml-multicore/eio#696, reviewed by @SGrondin).
  Using realpath was an old hack from the libuv days and subject to races. It was also slow.

- Keep pool of systhreads for blocking operations (@SGrondin @talex5 ocaml-multicore/eio#681).
  This is much faster than creating a new thread for each operation.
  It mainly benefits the eio_posix backend, as that uses lots of systhreads.

- Make `Switch.on_release` thread-safe (@talex5 ocaml-multicore/eio#684, requested by @art-w and @clecat).
  This allows resource pools to be shared between domains easily.

- Add `Eio.Path.read_link` (@talex5 ocaml-multicore/eio#686).

- Add `Eio_unix.Fd.is_open` (@talex5 ocaml-multicore/eio#690).

- Include backtrace in systhread errors (@talex5 ocaml-multicore/eio#688, reviewed by @SGrondin).
  Also, add `Eio.Exn.empty_backtrace` as a convenience.

- eio.mock: add tracing support to mock backend (@talex5 ocaml-multicore/eio#687).

- Improve tracing (@talex5 ocaml-multicore/eio#675 ocaml-multicore/eio#683 ocaml-multicore/eio#676, reviewed by @SGrondin).
  Update tracing section of README and trace more things
  (`run_in_systhread`, `close`, `submit`, `traceln`, cancellation and domain spawning).

Documentation:

- Link to verification work in docs (@talex5 ocaml-multicore/eio#682).

- Add more trace diagrams to README (@talex5 ocaml-multicore/eio#698).

- Adjust COC contacts (@polytypic ocaml-multicore/eio#685, reviewed by @Sudha247).

Bug fixes:

- eio_linux: retry `openat2` on `EAGAIN` (@talex5 ocaml-multicore/eio#693, reviewed by @SGrondin).

- eio_posix and eio_windows: check for IO periodically (@talex5 ocaml-multicore/eio#674).

- Handle EPERM when trying to initialise uring (@talex5 ocaml-multicore/eio#691).
  This can happen when using a Docker container.

Build and tests:

- Benchmark `Eio_unix.run_in_systhread` (@talex5 ocaml-multicore/eio#678, reviewed by @SGrondin).

- Enable lintcstubs for `Eio_unix.Private` too (@talex5 ocaml-multicore/eio#689).

- Stat benchmark: report cleanup time and optimise (@talex5 ocaml-multicore/eio#692).

- Make benchmarks start faster (@talex5 ocaml-multicore/eio#673).

- Update build for new eio-trace CLI (@talex5 ocaml-multicore/eio#699).

- Expect opam-repo-ci tests to fail on macos (@talex5 ocaml-multicore/eio#672).
talex5 added a commit to talex5/opam-repository that referenced this pull request Feb 22, 2024
CHANGES:

New features:

- eio_posix: use directory FDs instead of realpath (@talex5 ocaml-multicore/eio#694 ocaml-multicore/eio#696, reviewed by @SGrondin).
  Using realpath was an old hack from the libuv days and subject to races. It was also slow.

- Keep pool of systhreads for blocking operations (@SGrondin @talex5 ocaml-multicore/eio#681).
  This is much faster than creating a new thread for each operation.
  It mainly benefits the eio_posix backend, as that uses lots of systhreads.

- Make `Switch.on_release` thread-safe (@talex5 ocaml-multicore/eio#684, requested by @art-w and @clecat).
  This allows resource pools to be shared between domains easily.

- Add `Eio.Path.read_link` (@talex5 ocaml-multicore/eio#686).

- Add `Eio_unix.Fd.is_open` (@talex5 ocaml-multicore/eio#690).

- Include backtrace in systhread errors (@talex5 ocaml-multicore/eio#688, reviewed by @SGrondin).
  Also, add `Eio.Exn.empty_backtrace` as a convenience.

- eio.mock: add tracing support to mock backend (@talex5 ocaml-multicore/eio#687).

- Improve tracing (@talex5 ocaml-multicore/eio#675 ocaml-multicore/eio#683 ocaml-multicore/eio#676, reviewed by @SGrondin).
  Update tracing section of README and trace more things
  (`run_in_systhread`, `close`, `submit`, `traceln`, cancellation and domain spawning).

Documentation:

- Link to verification work in docs (@talex5 ocaml-multicore/eio#682).

- Add more trace diagrams to README (@talex5 ocaml-multicore/eio#698).

- Adjust COC contacts (@polytypic ocaml-multicore/eio#685, reviewed by @Sudha247).

Bug fixes:

- eio_linux: retry `openat2` on `EAGAIN` (@talex5 ocaml-multicore/eio#693, reviewed by @SGrondin).

- eio_posix and eio_windows: check for IO periodically (@talex5 ocaml-multicore/eio#674).

- Handle EPERM when trying to initialise uring (@talex5 ocaml-multicore/eio#691).
  This can happen when using a Docker container.

Build and tests:

- Benchmark `Eio_unix.run_in_systhread` (@talex5 ocaml-multicore/eio#678, reviewed by @SGrondin).

- Enable lintcstubs for `Eio_unix.Private` too (@talex5 ocaml-multicore/eio#689).

- Stat benchmark: report cleanup time and optimise (@talex5 ocaml-multicore/eio#692).

- Make benchmarks start faster (@talex5 ocaml-multicore/eio#673).

- Update build for new eio-trace CLI (@talex5 ocaml-multicore/eio#699).

- Expect opam-repo-ci tests to fail on macos (@talex5 ocaml-multicore/eio#672).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants