Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nasty hidden invariant in deschedule_running_task_and_then #8132

Closed
bblum opened this issue Jul 30, 2013 · 8 comments · Fixed by #10817
Closed

Nasty hidden invariant in deschedule_running_task_and_then #8132

bblum opened this issue Jul 30, 2013 · 8 comments · Fixed by #10817
Labels
A-concurrency Area: Concurrency A-runtime Area: std's runtime and "pre-main" init for handling backtraces, unwinds, stack overflows E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot.

Comments

@bblum
Copy link
Contributor

bblum commented Jul 30, 2013

There is currently a gentleperson's agreement about the use of the higher-order context switchers in the new runtime. Usually, they are to be used like this:

let mut environment = ...;
do sched.deschedule_running_task_and_then |sched, task| {
    ... use environment ...
    sched.schedule_blocked_task(task);
}
... use environment ...

However, because the closure ("cleanup job") runs in scheduler context, on SMP the task can resume on a different core before the closure finishes running. Hence, if you did:

let mut environment = ...;
do sched.deschedule_running_task_and_then |sched, task| {
    sched.schedule_blocked_task(task);
    ... use environment ... // A
}
... use environment ... // B

...then A and B will race and the world will explode.

There are two possible solutions. One is to leave this issue open forever, and audit all uses of the context-switchers before doing each major release. The second is to use the type system (which once again proves expressive enough to deal with this sort of thing). It would look something like this:

impl Scheduler {
    // Passes the environment explicitly as a noncopyable Env token.
    fn deschedule_running_task_and_then<E>(~self, E, extern fn(&mut Scheduler, BlockedTask, Env<E>))

    // Wakes a blocked task, also proving that the caller won't access its environment anymore.
    fn schedule_blocked_task<E>(&mut self, BlockedTask, Env<E>)
}
impl <E> Env<E> {
    // Can only access environment as long as opaque Env permission token is held.
    fn get<'a>(&'a mut self) -> &'a mut E
}

There is just one problem with this, which is that it doesn't save you if you're already manipulating unsafe pointers as part of the environment (such as we do in comm::PortOne). Those use sites will have to be audited nonetheless.

@bblum
Copy link
Contributor Author

bblum commented Jul 30, 2013

nominating well-covered

@bblum
Copy link
Contributor Author

bblum commented Aug 6, 2013

Here's another idea, much less type-systemsy. Since we know the task will resume on a different core, we could just have it spin-wait until the cleanup-job is finished running. It would be the equivalent of this:

let mut environment = ...;
let ok_to_run_task = AtomicBool::new(false);
do sched.deschedule_running_task_and_then |sched, task| {
    sched.schedule_blocked_task(task);
    ... use environment ... // A
    ok_to_run_task.store(true, Release); // always the last thing that runs in the closure
}
while !ok_to_run_task.load(Acquire) { } // always the first thing to run when task resumes
... use environment ... // B

// victory, a happens-before relationship established between A and B

We can make it so this synchronization dance always automatically happens by putting it inside change_task_context. This is tricky because a pointer to the AtomicBool will need to be stored in the CleanupJob alongside the closure, so run_cleanup_job will be the place the release-store is done, and after the context-swap in change_task_context will be where the acquire-load loop is done.

There is good news and bad news about this approach. The pro is that this enables users of this interface to access the environment even after giving the task away, such as in select (#8347) where it's necessary. The con is that a careless user that gives a cleanup job that takes "a long time" to execute after giving away the task handle can cause the other CPU to spin-wait for "just as long". (It is not, however, vulnerable to deadlock.)

@bblum
Copy link
Contributor Author

bblum commented Aug 6, 2013

Ah, actually, I think we can just use a oneshot in that context, and everybody wins. It is also no harder on cacheline contention than the atomicbool plan, but both are still worse (best case: one extra xchg in the context switch path) than the type systemsy solution.

Perhaps we can have one of each version. Select can use the slower-by-one-xchg version; other places can use the environment-token solution.

@brson
Copy link
Contributor

brson commented Aug 6, 2013

It's certainly less invasive than the type-based approach. Although it seems unlikely to contend often, spinning sucks for valgrind, and valgrind's serialization may mean that it does contend under valgrind more than under normal execution. Spending two atomic ops to maintain this invariant seems extravagant.

@bblum
Copy link
Contributor Author

bblum commented Aug 13, 2013

Actually , re-nominating 0.8 milestone. As long as this goes unsolved, the task-killing race in switch_running_tasks_and_then can't be fixed, and any program that mixes failure and message-passing will go through this buggy path and will be at risk for crashing (or worse!).

@bblum
Copy link
Contributor Author

bblum commented Aug 21, 2013

I have thought a bit more about this, and basically think that using a LittleLock is the best way.

My most recent idea was to use an atomic flag that's shared between the cleanup job and someone trying to kill the task (very similar to the state flag used in ports/chans), such that whichever of the two runs later is responsible for reenqueueing the task. But the problem with this is that you also need to do a swap on the flag if you're waking up the task normally (or else a reawoken task going to do a second cleanup job could race with its own first cleanup job), and that would cause contention on the flag even in the most common context switches.

The idea with the LittleLock would involve having one such lock per task struct, and in switch_running_tasks_and_then, if the task is not unkillable, taking the lock around the BlockedTask::try_block(task)/f(sched, blocked_task) sequence. Dropping the lock here would be through an unsafe pointer, since we give the task away to try to block it (or perhaps have it live in an unsafearc, but that would require its own refcounting). To protect that pointer, the lock would also have to be bounced off of (i.e., do task.cleanup_job_lock.lock { }) in task death. The guarantee would be provided through a similar lock-bounce when a task is about to fail from being killed.

@catamorphism
Copy link
Contributor

Just a bug, de-nominating

@bblum
Copy link
Contributor Author

bblum commented Sep 12, 2013

IMO this is more than a bug, or at least, one that needs to be addressed before the next release. Because of this problem, all the test cases for linked failure are currently ignored, and although user programs can still use linked failure, they will segfault randomly. Since the test cases are ignored, the implementation is at risk of rotting until this is fixed.

bors added a commit that referenced this issue Dec 5, 2013
Right now, as pointed out in #8132, it is very easy to introduce a subtle race
in the runtime. I believe that this is the cause of the current flakiness on the
bots.

I have taken the last idea mentioned in that issue which is to use a lock around
descheduling and context switching in order to solve this race.

Closes #8132
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-concurrency Area: Concurrency A-runtime Area: std's runtime and "pre-main" init for handling backtraces, unwinds, stack overflows E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants