-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Much higher compile times with -Z threads=8
than with -Z threads=1
#117755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Outputs of |
I confirm the issue, compiling the same thing:
Test system has 24 cores.
|
Do you have the same flags set for both builds? using |
In my case this is a fresh clone, 0 local changes. |
I did have Single thread: 23.29s That's still a 10% regression. |
I've re-measured I still see |
I reran with RUSTFLAGS="-Z threads=1", got:
which is about the same as without the -Z flag. |
It's curious that a system with a higher core count is seeing a greater regression: 39.825s to 49.476 is a nearly 20% increase in compilation time, compared to a 10% increase on my 6-core CPU. |
It's not particularly curious, adding more threads to a case which suffers a scalability problem does tend to increase total run time. And I have more cores to exercise the problem at the same time. I tried to get a differential flamegraph based on perf record output, but perf report ended up executing for almost 2h(!) before I killed it, boggled down in comunicating with addr2line which kept failing to resolve anything (it was making forward progress, just incredibly slowly and the result was useless anyway). Debian 12 for interested parties. |
Try |
I'm seeing a similar issue (slower with 8 threads than 1) when compiling the following:
Something I noticed was that there was a clear pattern in the timings:
I wonder if some of the scalability issues could be fixed with a simple heuristic on how big the crate and disabling parallelism for very small crates. 1 Thread: ![]() 8 threads: ![]() See attached files for full timings: |
Now that's a blast from the past, I forgot all about this as I dropped Rust. Looking at this now I think the issue is pretty clear: builds prior to the change were already progressing with parallelism amongst crates (but still single-threaded within a given crate) -- any idle CPU time got scooped up. Now that there is threading for everything there is extra overhead just to spawn these threads (and then have them compete with each other), but they have no idle CPU time to fill in. As a result for small crates there is just more overhead for no benefit. Some heuristic for thread count would definitely be welcome, but what this probably wants on top of it is a make-like jobserver. |
We are already using the jobserver protocol inside rustc to limit thread spawning with cargo taking the place of make if cargo itself doesn't receice a jobserver pipe. This has been the case since forever to limit the amount of threads running LLVM. I'm not entirely sure how rustc-rayon reacts when it wants to spawn a new thread but jobserver says no. Does it block? Or does it put the workitem on the queue of another thread? |
When compiling
cargo audit
from git on commit b6baecc0ea4e2d115e4e10b10c2196b33d42c1da, I'm seeing the project build in 19 seconds on my machine with-Z threads=1
but it takes 25 seconds with-Z threads=8
.I am using a 6-core desktop CPU, so no chance of NUMA issues. I'm also seeing 25s compile times with
-Z threads=6
, matching the CPU core count.I've captured Samply profiles but they are hard to make sense of due to the sheer number of threads (4000 for a single thread, 6000 for multiple threads). They are too big for sharing via firefox.dev, so please find them attached:
profile-1-thread.json.gz
profile-8-threads.json.gz
Meta
rustc --version --verbose
:The text was updated successfully, but these errors were encountered: