Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

task: fix LocalSet having a single shared task budget #2462

Merged
merged 4 commits into from
Apr 30, 2020

Conversation

hawkw
Copy link
Member

@hawkw hawkw commented Apr 29, 2020

Motivation

Currently, an issue exists where a LocalSet has a single cooperative
task budget that's shared across all futures spawned on the LocalSet
and by any future passed to LocalSet::run_until or
LocalSet::block_on. Because these methods will poll the run_until
future before polling spawned tasks, it is possible for that task to
always deterministically starve the entire LocalSet so that no local
tasks can proceed. When the completion of that future itself depends
on other tasks on the LocalSet, this will then result in a deadlock,
as in issue #2460.

A detailed description of why this is the case, taken from this
comment
:

LocalSet wraps each time a local task is run in budget:

Some(task) => crate::coop::budget(|| task.run()),

This is identical to what tokio's other schedulers do when running
tasks, and in theory should give each task its own budget every time
it's polled.

However, LocalSet is different from other schedulers. Unlike the
runtime schedulers, a LocalSet is itself a future that's run on
another scheduler, in block_on. block_on also sets a budget:

if let Ready(v) = crate::coop::budget(|| future.as_mut().poll(&mut cx)) {

The docs for budget state that:

/// Enabling budgeting when it is already enabled is a no-op.

This means that inside of a LocalSet, the calls to budget are
no-ops. Instead, each future polled by the LocalSet is subtracting
from a single global budget.

LocalSet's RunUntil future polls the provided future before polling
any other tasks spawned on the local set:

if let Poll::Ready(output) = me.future.poll(cx) {
return Poll::Ready(output);
}
if me.local_set.tick() {
// If `tick` returns `true`, we need to notify the local future again:
// there are still tasks remaining in the run queue.
cx.waker().wake_by_ref();
}
Poll::Pending

In this case, the provided future is JoinAll. Unfortunately, every
time a JoinAll is polled, it polls every joined future that has not
yet completed. When the number of futures in the JoinAll is >= 128,
this means that the JoinAll immediately exhausts the task budget. This
would, in theory, be a good thing --- if the JoinAll had a huge
number of JoinHandles in it and none of them are ready, it would limit
the time we spend polling those join handles.

However, because the LocalSet actually has a single shared task
budget, this means polling the JoinAll always exhausts the entire
budget. There is now no budget remaining to poll any other tasks spawned
on the LocalSet, and they are never able to complete.

Solution

This branch solves this issue by resetting the task budget when polling
a LocalSet. I've added a new function to coop for resetting the task
budget to UNCONSTRAINED for the duration of a closure, and thus
allowing the budget calls in LocalSet to actually create a new
budget for each spawned local task. Additionally, I've changed
LocalSet to also ensure that a separate task budget is applied to
any future passed to block_on/run_until.

Additionally, I've added a test reproducing the issue described in
#2460. This test fails prior to this change, and passes after it.

Fixes #2460

hawkw added 2 commits April 29, 2020 11:26
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
@hawkw hawkw added C-bug Category: This is a bug. A-tokio Area: The main tokio crate M-task Module: tokio/task I-hang Program never terminates, resulting from infinite loops, deadlock, livelock, etc. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 29, 2020
@hawkw hawkw requested review from carllerche and jonhoo April 29, 2020 19:31
@hawkw hawkw self-assigned this Apr 29, 2020
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Copy link
Contributor

@kleimkuhler kleimkuhler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a heads up there are also a few dbg!s left in this change

@hawkw
Copy link
Member Author

hawkw commented Apr 29, 2020

Just a heads up there are also a few dbg!s left in this change

agh! thanks for noticing!

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
@jonhoo
Copy link
Contributor

jonhoo commented Apr 30, 2020

Hmm, it's definitely a little weird for LocalSet to be allowed to effectively boundlessly increase the size of its budget. This change would mean that a LocalSet can basically run forever, never yielding back to the runtime, as long as its inner futures keep spawning things and the block_on future does not resolve. That doesn't seem right?

Is there a reason to prefer this way of fixing it over, say, using coop::limit to only give the block_on future access to something like half the remaining budget?

@hawkw
Copy link
Member Author

hawkw commented Apr 30, 2020

@jonhoo

This change would mean that a LocalSet can basically run forever, never yielding back to the runtime, as long as its inner futures keep spawning things and the block_on future does not resolve.

Hmm. It also seemed wrong, to me, for all tasks on the LocalSet to share a single budget, even if we restrict the amount of that budget that's available to the block_on future? When a future is running on a real scheduler and it spawns a new task, the spawned task will have its own budget independent of the parent, so I didn't think it was correct for a locally spawned future to share budget with its parent. I'm open to reevaluating that decision — this was just my thought process.

@jonhoo
Copy link
Contributor

jonhoo commented Apr 30, 2020

Yeah, I honestly don't know what the right thing to do is here. My instinct is that LocalSet is no less exempt from "you must yield the top-level task every so often" than any other task is. Maybe there's an argument for LocalSet being allowed a larger budget, but even that I'm not sure about? What is the argument for doing so?

If we were to go the route this PR proposes, we would at the very least need to special-case LocalSet further, such as by not allowing the "unstealable slot" optimization when the current task is a LocalSet. Otherwise, such a task may be perpetually starved.

@hawkw
Copy link
Member Author

hawkw commented Apr 30, 2020

Maybe there's an argument for LocalSet being allowed a larger budget, but even that I'm not sure about? What is the argument for doing so?

As an example of the kind of scenario I was concerned about, consider a case where I've spawned, say, a third of my futures on a LocalSet, and the remainder on the threadpool. Each spawned task has its own budget of 128 polls. However, the LocalSet tasks share a single budget of 128 polls. Even if we used limit to avoid one future starving the rest of the LocalSet, any given local future is allowed to proceed significantly less often than a future running on the threadpool.

When the threadpool scheduler is used, no tasks other than the one passed to block_on is executing on the main thread; all spawned tasks are scheduled on worker threads. So, the local tasks are not competing with other tasks, only with each other. If they all share a single budget of 128 polls, the local set will spend a considerably larger fraction of its time round-tripping tasks through the scheduler than the threadpool workers.

And, it's worth remembering that the local set already has its own limit on the number of tasks it will poll per tick before yielding, separately from the task budget — if more than that number of tasks have been notified since the last local set tick, it will return Pending and wake itself. This already ensures that when running a local set side-by-side with other tasks (e.g. on the basic scheduler), the local set will yield to other tasks rather than constantly polling its subtasks. If that wasn't the case, it would make more sense for the local set to have its own budget, IMO.

I wonder if @carllerche has an opinion?

@jonhoo
Copy link
Contributor

jonhoo commented Apr 30, 2020

Hmm, that's interesting. The observation that LocalSet is only ever used in block_on or on basic_scheduler helps. There are, I suppose, still cases where a LocalSet could be contained within another task in those contexts, but that seems much rarer. I was thinking of a case where a LocalSet was somehow running on a worker thread, but that can't actually happen today, so my thinking there was just wrong.

In light of ^, I think I agree with you that unconstrained budgeting probably makes sense for LocalSet itself.

@hawkw hawkw merged commit 20b5df9 into master Apr 30, 2020
@carllerche carllerche deleted the eliza/localset-joinall branch May 2, 2020 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. I-hang Program never terminates, resulting from infinite loops, deadlock, livelock, etc. M-task Module: tokio/task S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LocalSet, JoinHandle and budget coming together can lead to hangs with 100% CPU usage.
3 participants