-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LocalSet, JoinHandle and budget coming together can lead to hangs with 100% CPU usage. #2460
Comments
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
I think I know what's up here.
Line 406 in 947045b
This is identical to what tokio's other schedulers do when running tasks, and in theory should give each task its own budget every time it's polled. However, tokio/tokio/src/runtime/basic_scheduler.rs Line 131 in 947045b
The docs for Line 73 in 947045b
This means that inside of a
Lines 525 to 535 in 947045b
In this case, the provided future is However, because the There are two possible solutions: |
That's some awesome debugging! Thanks for looking into that. There is one thing I don't understand though:
From what I gather from the logs I posted, the inner tasks have in fact always run and completed before the outer ones. After they print, they immediately return It is very late here, so I have to sleep, but tomorrow I will run the test with your PR. I imagine it will solve the hang, but I suspect there is still an underlying issue that could have solved this from another angle and that may be desirable to solve as well. I'll have a look at the code of JoinHandle tomorrow as well. |
When the |
Yes, thanks. It dawned to me overnight that the budget was returning Pending, not the JoinHandle... |
## Motivation Currently, an issue exists where a `LocalSet` has a single cooperative task budget that's shared across all futures spawned on the `LocalSet` _and_ by any future passed to `LocalSet::run_until` or `LocalSet::block_on`. Because these methods will poll the `run_until` future before polling spawned tasks, it is possible for that task to _always_ deterministically starve the entire `LocalSet` so that no local tasks can proceed. When the completion of that future _itself_ depends on other tasks on the `LocalSet`, this will then result in a deadlock, as in issue #2460. A detailed description of why this is the case, taken from [this comment][1]: `LocalSet` wraps each time a local task is run in `budget`: https://github.com/tokio-rs/tokio/blob/947045b9445f15fb9314ba0892efa2251076ae73/tokio/src/task/local.rs#L406 This is identical to what tokio's other schedulers do when running tasks, and in theory should give each task its own budget every time it's polled. _However_, `LocalSet` is different from other schedulers. Unlike the runtime schedulers, a `LocalSet` is itself a future that's run on another scheduler, in `block_on`. `block_on` _also_ sets a budget: https://github.com/tokio-rs/tokio/blob/947045b9445f15fb9314ba0892efa2251076ae73/tokio/src/runtime/basic_scheduler.rs#L131 The docs for `budget` state that: https://github.com/tokio-rs/tokio/blob/947045b9445f15fb9314ba0892efa2251076ae73/tokio/src/coop.rs#L73 This means that inside of a `LocalSet`, the calls to `budget` are no-ops. Instead, each future polled by the `LocalSet` is subtracting from a single global budget. `LocalSet`'s `RunUntil` future polls the provided future before polling any other tasks spawned on the local set: https://github.com/tokio-rs/tokio/blob/947045b9445f15fb9314ba0892efa2251076ae73/tokio/src/task/local.rs#L525-L535 In this case, the provided future is `JoinAll`. Unfortunately, every time a `JoinAll` is polled, it polls _every_ joined future that has not yet completed. When the number of futures in the `JoinAll` is >= 128, this means that the `JoinAll` immediately exhausts the task budget. This would, in theory, be a _good_ thing --- if the `JoinAll` had a huge number of `JoinHandle`s in it and none of them are ready, it would limit the time we spend polling those join handles. However, because the `LocalSet` _actually_ has a single shared task budget, this means polling the `JoinAll` _always_ exhausts the entire budget. There is now no budget remaining to poll any other tasks spawned on the `LocalSet`, and they are never able to complete. [1]: #2460 (comment) ## Solution This branch solves this issue by resetting the task budget when polling a `LocalSet`. I've added a new function to `coop` for resetting the task budget to `UNCONSTRAINED` for the duration of a closure, and thus allowing the `budget` calls in `LocalSet` to _actually_ create a new budget for each spawned local task. Additionally, I've changed `LocalSet` to _also_ ensure that a separate task budget is applied to any future passed to `block_on`/`run_until`. Additionally, I've added a test reproducing the issue described in #2460. This test fails prior to this change, and passes after it. Fixes #2460 Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Affected Versions
0.2.14 - 0.2.19
Platform
arch linux, but no IO involved.
Description
Please see the issue reported first at futures-rs because I suspected
JoinAll
was maybe at fault.JoinHandle
often reports Pending even though it's task has already returnedReady
. Over a collection of futures like inLocalSet
, this can require several runs before all containedJoinHandle
s returnReady
. When there are more tasks in the set thanbudget
, this can lead to theJoinHandle
s never returningReady
and tasks awaiting them to hang.It is possible that an inefficiency in the synchronization of
JoinHandle
also has an impact when using a threadpool, but it might not be so noticable since this just throws more resources at the problem.The text was updated successfully, but these errors were encountered: