From ccea27ccf6aae58c3293aab12ea289054cee9afd Mon Sep 17 00:00:00 2001 From: Alex Crichton Date: Sun, 5 Jan 2014 09:02:01 -0800 Subject: [PATCH] Don't always wake up sleeping schedulers I created a benchmark recently of incrementing a global variable inside of a `extra::sync::Mutex`, and it turned out to be horribly slow. For 100K increments (per thread), the timings I got were: 1 green thread: 10.73ms 8 green threads: 10916.14ms 1 native thread: 9.19ms 8 native threads: 4610.18ms Upon profiling the test, most of the time is spent in `kevent()` (I'm on OSX) and `write()`. I thought that this was because we were falling into epoll too much, but after implementing the scheduler only falling back to epoll() if there is no work or active I/O handles, it didn't fix the problem. The problem actually turned out to be that the schedulers were in high contention over the tasks being run. With RUST_TASKS=1, this test is blazingly fast (78ms), and with RUST_TASKS=2, its incredibly slow (3824ms). The reason that I found for this is that the tasks being enqueued are constantly stolen by other schedulers, meaning that tasks are just getting ping-ponged back and forth around schedulers while the schedulers spend *a lot* of time in `kevent` and `write` waking each other up. This optimization only wakes up a sleeping scheduler on every 8th task that is enqueued. I have found this number to be the "low sweet spot" for maximizing performance. The numbers after I made this change are: 1 green thread: 13.96ms 8 green threads: 80.86ms 1 native thread: 13.59ms 8 native threads: 4239.25ms Which indicates that the 8-thread performance is up to the same level of RUST_TASKS=1, and the other numbers essentiallyt stayed the same. In other words, this is a 136x improvement in highly contentious green programs. --- src/libgreen/sched.rs | 26 +++++++++++++++++--------- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/src/libgreen/sched.rs b/src/libgreen/sched.rs index b0b88e4be7936..9d474e0e14afa 100644 --- a/src/libgreen/sched.rs +++ b/src/libgreen/sched.rs @@ -89,6 +89,7 @@ pub struct Scheduler { /// Bookeeping for the number of tasks which are currently running around /// inside this pool of schedulers task_state: TaskState, + enqueued_tasks: uint, // n.b. currently destructors of an object are run in top-to-bottom in order // of field declaration. Due to its nature, the pausable idle callback @@ -164,6 +165,7 @@ impl Scheduler { yield_check_count: 0, steal_for_yield: false, task_state: state, + enqueued_tasks: 0, }; sched.yield_check_count = reset_yield_check(&mut sched.rng); @@ -553,16 +555,22 @@ impl Scheduler { None => {} // allow enqueuing before the scheduler starts } - // We've made work available. Notify a - // sleeping scheduler. - - match self.sleeper_list.casual_pop() { - Some(handle) => { - let mut handle = handle; - handle.send(Wake) + // We've made work available, so we might want to notify a sleeping + // scheduler. Note that this notification is fairly expensive (involves + // doing a `write()` on a pipe), so we don't attempt to wake up a remote + // scheduler on *all* task enqueues. Testing has shown that 8 is the + // lowest value which achieves a dramatic speedup (8 threads + // incrementing a global counter inside of a mutex). + if self.enqueued_tasks % 8 == 0 { + match self.sleeper_list.casual_pop() { + Some(handle) => { + let mut handle = handle; + handle.send(Wake) + } + None => { (/* pass */) } } - None => { (/* pass */) } - }; + } + self.enqueued_tasks += 1; } // * Core Context Switching Functions