-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid work-stealing in bytecode compilation #4004
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool find, i didn't realize we were paying a premium for work-stealing!
.enable_all() | ||
.build() | ||
.expect("Failed to build runtime") | ||
.block_on(worker) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it still makes sense for the worker to be async? We're not really using any async-specific features there except for timeouts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into making them non-async very briefly, but getting the timeouts to work with synchronous I/O is not trivial so I left it for now. I don't think it's a huge deal because most of the work is run on a separate process anyways, but we could probably get some minor gains from stripping out the async.
.name("uv-compile".to_owned()) | ||
.spawn(move || { | ||
// Report panics back to the main thread. | ||
let result = panic::catch_unwind(AssertUnwindSafe(|| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably missing something silly here, but why the catch_unwind
? You have a thread boundary here, so you should be able to just extract the panic from the result of joining the thread?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We wait on the threads asynchronously, which is why we use the oneshot channel instead of blocking on join
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm okay, I think that makes sense to me. Thanks!
## Summary Move completely off tokio's multi-threaded runtime. We've slowly been making changes to be smarter about scheduling in various places instead of depending on tokio's general purpose work-stealing, notably #3627 and #4004. We now no longer benefit from the multi-threaded runtime, as we run on all I/O on the main thread. There's one remaining instance of `block_in_place` that can be swapped for `rayon::spawn`. This change is a small performance improvement due to removing some unnecessary overhead of the multi-threaded runtime (e.g. spawning threads), but nothing major. It also removes some noise from profiles. ## Test Plan ``` Benchmark 1: ./target/profiling/uv (resolve-warm) Time (mean ± σ): 14.9 ms ± 0.3 ms [User: 3.0 ms, System: 17.3 ms] Range (min … max): 14.1 ms … 15.8 ms 169 runs Benchmark 2: ./target/profiling/baseline (resolve-warm) Time (mean ± σ): 16.1 ms ± 0.3 ms [User: 3.9 ms, System: 18.7 ms] Range (min … max): 15.1 ms … 17.3 ms 162 runs Summary ./target/profiling/uv (resolve-warm) ran 1.08 ± 0.03 times faster than ./target/profiling/baseline (resolve-warm) ```
Summary
Avoid using work-stealing Tokio workers for bytecode compilation, favoring instead dedicated threads. Tokio's work-stealing does not really benefit us because we're spawning Python workers and scheduling tasks ourselves — we don't want Tokio to re-balance our workers. Because we're doing scheduling ourselves and compilation is a primarily compute-bound task, we can also create dedicated runtimes for each worker and avoid some synchronization overhead.
This is part of a general desire to avoid relying on Tokio's work-stealing scheduler and be smarter about our workload. In this case we already had the custom scheduler in place, Tokio was just getting in the way (though the overhead is very minor).
Test Plan
This improves performance by ~5% on my machine.