Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid work-stealing in bytecode compilation #4004

Merged
merged 2 commits into from
Jun 4, 2024

Conversation

ibraheemdev
Copy link
Member

@ibraheemdev ibraheemdev commented Jun 4, 2024

Summary

Avoid using work-stealing Tokio workers for bytecode compilation, favoring instead dedicated threads. Tokio's work-stealing does not really benefit us because we're spawning Python workers and scheduling tasks ourselves — we don't want Tokio to re-balance our workers. Because we're doing scheduling ourselves and compilation is a primarily compute-bound task, we can also create dedicated runtimes for each worker and avoid some synchronization overhead.

This is part of a general desire to avoid relying on Tokio's work-stealing scheduler and be smarter about our workload. In this case we already had the custom scheduler in place, Tokio was just getting in the way (though the overhead is very minor).

Test Plan

This improves performance by ~5% on my machine.

$ hyperfine --warmup 1 --prepare "target/profiling/uv-dev clear-compile .venv" "target/profiling/uv-dev compile .venv" "target/profiling/uv-dev-dedicated compile .venv"
Benchmark 1: target/profiling/uv-dev compile .venv
  Time (mean ± σ):      1.279 s ±  0.011 s    [User: 13.803 s, System: 2.998 s]
  Range (min … max):    1.261 s …  1.296 s    10 runs
 
Benchmark 2: target/profiling/uv-dev-dedicated compile .venv
  Time (mean ± σ):      1.220 s ±  0.021 s    [User: 13.997 s, System: 3.330 s]
  Range (min … max):    1.198 s …  1.272 s    10 runs

Summary
  target/profiling/uv-dev-dedicated compile .venv ran
    1.05 ± 0.02 times faster than target/profiling/uv-dev compile .venv

$ hyperfine --warmup 1 --prepare "target/profiling/uv-dev clear-compile .venv" "target/profiling/uv-dev compile .venv" "target/profiling/uv-dev-dedicated compile .venv"
Benchmark 1: target/profiling/uv-dev compile .venv
  Time (mean ± σ):      3.631 s ±  0.078 s    [User: 47.205 s, System: 4.996 s]
  Range (min … max):    3.564 s …  3.832 s    10 runs
 
Benchmark 2: target/profiling/uv-dev-dedicated compile .venv
  Time (mean ± σ):      3.521 s ±  0.024 s    [User: 48.201 s, System: 5.392 s]
  Range (min … max):    3.484 s …  3.566 s    10 runs
 
Summary
  target/profiling/uv-dev-dedicated compile .venv ran
    1.03 ± 0.02 times faster than target/profiling/uv-dev compile .venv

@ibraheemdev ibraheemdev requested review from konstin and BurntSushi June 4, 2024 00:11
Copy link
Member

@konstin konstin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool find, i didn't realize we were paying a premium for work-stealing!

.enable_all()
.build()
.expect("Failed to build runtime")
.block_on(worker)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it still makes sense for the worker to be async? We're not really using any async-specific features there except for timeouts.

Copy link
Member Author

@ibraheemdev ibraheemdev Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into making them non-async very briefly, but getting the timeouts to work with synchronous I/O is not trivial so I left it for now. I don't think it's a huge deal because most of the work is run on a separate process anyways, but we could probably get some minor gains from stripping out the async.

@konstin konstin added the performance Potential performance improvement label Jun 4, 2024
.name("uv-compile".to_owned())
.spawn(move || {
// Report panics back to the main thread.
let result = panic::catch_unwind(AssertUnwindSafe(|| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably missing something silly here, but why the catch_unwind? You have a thread boundary here, so you should be able to just extract the panic from the result of joining the thread?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We wait on the threads asynchronously, which is why we use the oneshot channel instead of blocking on join.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm okay, I think that makes sense to me. Thanks!

@ibraheemdev ibraheemdev merged commit 3b8f3a7 into astral-sh:main Jun 4, 2024
46 checks passed
ibraheemdev added a commit that referenced this pull request Jul 9, 2024
## Summary

Move completely off tokio's multi-threaded runtime. We've slowly been
making changes to be smarter about scheduling in various places instead
of depending on tokio's general purpose work-stealing, notably
#3627 and
#4004. We now no longer benefit from
the multi-threaded runtime, as we run on all I/O on the main thread.
There's one remaining instance of `block_in_place` that can be swapped
for `rayon::spawn`.

This change is a small performance improvement due to removing some
unnecessary overhead of the multi-threaded runtime (e.g. spawning
threads), but nothing major. It also removes some noise from profiles.

## Test Plan

```
Benchmark 1: ./target/profiling/uv (resolve-warm)
  Time (mean ± σ):      14.9 ms ±   0.3 ms    [User: 3.0 ms, System: 17.3 ms]
  Range (min … max):    14.1 ms …  15.8 ms    169 runs
 
Benchmark 2: ./target/profiling/baseline (resolve-warm)
  Time (mean ± σ):      16.1 ms ±   0.3 ms    [User: 3.9 ms, System: 18.7 ms]
  Range (min … max):    15.1 ms …  17.3 ms    162 runs
 
Summary
  ./target/profiling/uv (resolve-warm) ran
    1.08 ± 0.03 times faster than ./target/profiling/baseline (resolve-warm)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Potential performance improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants