-
-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propagating cancellation through threads #606
Comments
Relevant to python-trio#886, python-trio#606, python-trio#285, python-trio#147, python-trio#70, python-trio#58, maybe others. I was continuing my effort to shoehorn linked cancel scopes and graceful cancellation into `CancelScope` earlier today and it was feeling too much of a mess, so I decided to explore other options. This PR is the result. It makes major changes to Trio's cancellation internals, but barely any to Trio's cancellation semantics -- all tests pass except for one that is especially persnickety about `cancel_called`. No new tests or docs yet as I wanted to get feedback on the approach before polishing. An overview: * New class `CancelBinding` manages a single lexical context (a `with` block or a task) that might get a different cancellation treatment than its surroundings. "All plumbing, no policy." * Each cancel binding has an effective deadline, a _single_ task, and links to parent and child bindings. Each parent lexically encloses its children. The only cancel bindings with multiple children are the ones immediately surrounding nurseries, and they have one child binding per nursery child task plus maybe one in the nested child. * Each cancel binding calculates its effective deadline based on its parent's effective deadline and some additional data. The actual calculation is performed by an associated `CancelLogic` instance (a small ABC). * `CancelScope` now implements `CancelLogic`, providing the deadline/shield semantics we know and love. It manages potentially-multiple `CancelBinding`s. * Cancel stacks are gone. Instead, each task has an "active" (innermost) cancel binding, which changes as the task moves in and out of cancellation regions. The active cancel binding's effective deadline directly determines whether and when `Cancelled` is raised in the task. * `Runner.deadlines` stores tasks instead of cancel scopes. There is no longer a meaningful state of "deadline is in the past but scope isn't cancelled yet" (this is what the sole failing test doesn't like). If the effective deadline of a task's active cancel binding is non-infinite and in the future, it goes in Runner.deadlines. If it's in the past, the task has a pending cancellation by definition. Potential advantages: * Cancellation becomes extensible without changes to _core, via users writing their own CancelLogic and wrapping a core CancelBinding(s) around it. We could even move CancelScope out of _core if we want to make a point. * Nursery.start() is much simpler. * Splitting shielding into a separate object from cancellation becomes trivial (they'd be two kinds of CancelLogic). * Most operations that are performed frequently take constant time: checking whether you're cancelled, checking what your deadline is, entering and leaving a cancel binding. I haven't benchmarked, so it's possible we're losing on constant factors or something, but in theory this should be faster than the old approach. * Since tasks now have well-defined root cancel bindings, I think python-trio#606 becomes straightforward via providing a way to spawn a system task whose cancel binding is a child of something other than the system nursery's cancel binding. Caveats: * We call `current_time()` a lot. Not sure if this is worth worrying about, and could probably be cached if so. * There are probably bugs, because aren't there always? Current cancel logic: ``` def compute_effective_deadline( self, parent_effective_deadline, parent_extra_info, task ): incoming_deadline = inf if self._shield else parent_effective_deadline my_deadline = -inf if self._cancel_called else self._deadline return min(incoming_deadline, my_deadline), parent_extra_info ``` Want to support a grace period? I'm pretty sure it would work with something like ``` def compute_effective_deadline( self, parent_effective_deadline, parent_extra_info, task ): parent_cleanup_deadline = parent_extra_info.get("effective_cleanup_deadline", parent_effective_deadline) if self._shield: parent_effective_deadline = parent_cleanup_deadline = inf my_cleanup_start = min(self._deadline, self._cancel_called_at) merged_cleanup_deadline = min(parent_cleanup_deadline, my_cleanup_start + self._grace_period) my_extra_info = parent_extra_info.set("effective_cleanup_deadline", merged_cleanup_deadline) if self._shield_during_cleanup: effective_deadline = merged_cleanup_deadline else: effective_deadline = min(parent_effective_deadline, my_cleanup_start) return effective_deadline, my_extra_info ``` Maybe that's not quite _simple_ but it is miles better than what I was looking at before. :-)
In retrospect, I think I was making this too complicated. The case where we want to propagate cancellation is when we're inside a Basically, after spawning the thread, it would sit in a loop like: while True:
msg_from_thread = await wait_task_rescheduled(...)
if msg_from_thread.type is THREAD_DONE:
return msg_from_thread.result.unwrap()
elif msg_from_thread.type is RUN:
result = await outcome.acapture(msg_from_thread.async_fn)
to_thread_queue.put(result)
... We'd stash the communications channels in thread-local variables, like we do the trio token, and then the thread would send messages back to report when it's finished, or when it has some work it needs to do in trio context. Interesting special cases that we'll want to make sure we handle correctly:
|
While writing up #607, I was thinking about a neat feature that curio has. I'm not sure if it's very important, but I envy it a bit, so I might as well open an issue to dump thoughts out where we can see them.
The feature is: say you use
run_sync_in_worker_thread
to enter a thread, and then from the thread you useBlockingTrioPortal.run
to re-enter trio. So conceptually, this is all one stack, that starts in trio, goes into a thread, and then goes back into trio. Now suppose the originalrun_sync_in_worker_thread
is cancelled. It would be neat if this caused the code inside theBlockingTrioPortal.run
call to raise aCancelled
error that propagated all the way back out of trio, through the thread, and back into trio.See #607 for more details about why this is tricky, and hard to fix even by extending the current cancel API.
I guess either we need some way to reenter a given cancel binding (which feels very weird and unlikely to have other use cases), or else some way to capture the exception in
run_sync_in_worker_thread
(<- this part we have) and then inject it into the eventual task (<- this part we don't have). Well... unless we do it by making a new cancel scope inside the eventual task, cancel it, and then re-raise after the inner cancellation unwinds. (We could even fudge the traceback if we want to be really tricky.) Or... we could switch from attachingCancelled
objects to specific scopes/bindings, and make it so that eachwith
block checks if it is associated with the outermost cancelled scope, and if so it catches allCancelled
exceptions, and then it would be fine to manually capture aCancelled
from inside the re-entry task and pass it over to the thread, before it gets caught. Except, ugh, that won't work if the exception isn'tCancelled
but ratherKeyboardInterrupt
.It wouldn't be terrible to manually cancel the re-entry task, then call the
raise_cancel
function to get the real exception, and attach the internal exception to it as a__context__
(though this would require #285).Alternatively... can we avoid all this rigmorale? What if
run_sync_in_worker_thread
created a nursery, and whenBTP.run
re-entered it put tasks into that nursery, instead of the system nursery? Then any cancel scope that contained therun_sync_in_worker_thread
call would also automatically contain any re-entry tasks.KeyboardInterrupt
might still require some careful handling, since it doesn't follow the normal cancel scope scoping rules. I guess we could do some shenanigans like pushing the actual wait-for-thread into a child task? Actually this would be pretty simple: open nursery, start a child to do the actual thread waiting stuff, then park in the nursery__aexit__
–KeyboardInterrupt
will be delivered to__aexit__
and cause the nursery to be cancelled. (Actually, this is an interesting possible strategy for deliveringKeyboardInterrupt
in general: pick a nursery and inject it directly there, as if it were raised by a task inside that nursery. Or do this to the system nursery, though that feels weird.) I'd be a little nervous about adding overhead torun_sync_in_worker_thread
, but maybe it wouldn't be that bad.The text was updated successfully, but these errors were encountered: