-
Notifications
You must be signed in to change notification settings - Fork 506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rayon uses a lot of CPU when there's not a lot of work to do #642
Comments
A profile on macOS shows most of the time being spent in sched_yield |
Also, a similar program written using the |
@lukewagner here's an example of some of the bad. |
Rewriting this using crossbeam-channel produces an executable that only uses 4% cpu. let (s, r) = unbounded();
for i in 0..8 {
let r = r.clone();
thread::spawn( move || loop {
let m: i32 = r.recv().unwrap();
});
}
while true {
std::thread::sleep(std::time::Duration::from_millis(1));
s.send(0);
} Perhaps rayon should use the same wakeup infrastructure that crossbeam-channel uses? |
From https://github.com/rayon-rs/rayon/blob/master/rayon-core/src/sleep/README.md, it seems rayon will repetitively yield to the operating system before a thread goes to sleep. How does this work wrt to syscall overhead? I think yielding to the scheduler is almost as expensive as blocking on a condition variable? |
The net performance of blocking has to include the cost of the syscall to wake it up too. Even if the raw cost were the same, there's also a latency difference in fully blocking. The
If I change
So not as much speedup, but lower cpu usage. |
A few links to what might making the problem worse:
|
I've been working on a rewrite of the sleep system that I think will address this problem. I opened #691 with an immediate refactoring, but I'm working on a more extensive rewrite. I hope to open an RFC in a few days with a sketch of the algorithm plus a few alternatives I'd like to evaluate. The gist of it is this:
We'll have to benchmark it, of course, and we'll probably want to tune things. |
Opened: rayon-rs/rfcs#5 with a fairly complete description of what I have in mind. I've not implemented all of that yet, but I'm getting there. |
Using the algorithm from the RFC seems to reduce this example to taking negligible CPU time:
|
I've been experimenting with this case where work is spawned every millisecond: fn main() {
while true {
std::thread::sleep(std::time::Duration::from_millis(1));
rayon::spawn(move || { } );
}
} With the current release of With the Excited to see where this goes. |
695: a few tweaks to the demos r=cuviper a=nikomatsakis A few changes that I am making on my branch that I would like applied to master to. This renames some benchmarks to make them uniquely identifiable and also adds a test to measure #642. Co-authored-by: Niko Matsakis <niko@alum.mit.edu>
Yeah. I'm trying that now. |
We do ultimately wait on condvars, but there's some spinning on |
I think I might have encountered this issue. It involves work that is easy to parallelize (essentially order-insensitive stream processing using I was unable to reproduce the issue with simpler minimal examples, where Rayon seemed to distribute the load nicely. So I rewrote the code to use a I suspect that the issue is that the work-giving thread (the one that reads the stream) is limited by IO and is not providing work fast enough to saturate 64 CPUs. The workload will never utilize full 64 CPUs - and that's fine. However, with Rayon it might happen that it is providing just enough work to hit this bug and make all cores 100% busy, but spending much of the time in busy-wait, so much that it actually prevents them from doing useful work. Reducing the number of threads fixes the issue because when the bug doesn't appear when there is enough work. The bug doesn't appear on my laptop for the same reason - it doesn't have enough CPU horsepower to outmatch the IO bandwidth of the reader thread. Sadly, upgrading Rayon to 1.4.0, which as I understand should include #746, didn't seem to make a difference. Is there some way to test whether my program triggers the issue reported here? (All of the above experiments are performed with release builds.) |
I'm going to close this issue now that 1.4.0 is published. @hniksic could you open a new issue for your case? There are definitely some scalability issues with |
@cuviper I was loath to open an issue since I couldn't provide a minimal example that reproduces the bug, and I tried. If you think the above description would be useful enough as an issue of its own, I'll gladly make one. |
@hniksic Issues are not a scarce resource. 🙂 We can at least discuss the problem in the abstract, and there might be other people that want to chime in, who may have something more reproducible. |
The following program uses about 30% of the CPU on 4 core (8 HT) Linux and Mac machines
Reducing the sleep duration to 1ms pushes the CPU usage up to 200%.
The text was updated successfully, but these errors were encountered: