Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task group is swallowing CancelledError. #374

Closed
jonathanslenders opened this issue Oct 6, 2021 · 36 comments · Fixed by #586
Closed

Task group is swallowing CancelledError. #374

jonathanslenders opened this issue Oct 6, 2021 · 36 comments · Fixed by #586
Labels
asyncio Involves the asyncio backend design Requires a design decision
Milestone

Comments

@jonathanslenders
Copy link
Contributor

If any of the tasks within the task group raise a CancelledError, then that does not propagate out of the task group. In my case, this happens because the code within the task group is cancelled from somewhere else.

from anyio import create_task_group
import asyncio

async def main():
    async def in_task_group():
        raise asyncio.CancelledError

    async with create_task_group() as tg:
        tg.start_soon(in_task_group)

asyncio.run(main())

Would this be a bug? Or is this expected behavior?

@jonathanslenders
Copy link
Contributor Author

Related to this, the following snippet exposes very weird behavior:

from anyio import create_task_group
import asyncio

async def test():
    # Create a task group with one very long running task in here.
    async def in_task_group():
        await asyncio.sleep(100)

    async with create_task_group() as tg:
        tg.start_soon(in_task_group)

    # - The main function should cancel the coroutine around this point -

    # I would not expect any of the following to be printed.
    # Somehow, the task within the task group gets cancelled, but this function
    # does not get cancelled.

    print('Before sleep')
    await asyncio.sleep(2)
    print('After sleep')

async def main():
    # Create task.
    task = asyncio.create_task(test())

    # Cancel task.
    await asyncio.sleep(1)
    task.cancel()

    await task

asyncio.run(main())

@agronholm
Copy link
Owner

Well...this is how anyio has always worked. On trio we don't have this problem because cancel scopes are the only way to cancel things there. On asyncio, cancelling a task just means raising CancelledError once in the target task. I think that most people have not had this problem because they've used structural concurrency all the way.
@graingert any thoughts?

@jonathanslenders
Copy link
Contributor Author

Well, imagine we have a library that uses structured concurrency everywhere within that library. But this library is called by a different codebase that does not follow structured concurrency. The above issue means that it's impossible to use libraries written with anyio in a code base that doesn't use anyio by itself. Is that correct?

@agronholm agronholm added asyncio Involves the asyncio backend design Requires a design decision labels Oct 7, 2021
@agronholm
Copy link
Owner

Suppose a task group was already cancelled via SC, and then another task natively cancels the host task. How would AnyIO tell the difference?

@jonathanslenders
Copy link
Contributor Author

@agronholm, honestly, I don't know. I worked around this by using a cancel scope anyway. But I think the issue remains that at some point, a task created by a library using anyio, could be cancelled using normal asyncio cancellation.

I wonder whether it's feasible to create a little wrapper that runs the anyio task in a cancel scope, catches (elsewhere) the CancelledError that asyncio raises, and as a result cancels the cancel scope.

@agronholm
Copy link
Owner

A potential solution that I came up a couple days ago would be to wait for the subtasks in a shielded cancel scope. That way, only native cancellations would get through and would then be forced to propagate out of the task group.

@agronholm
Copy link
Owner

To detect native cancellations prior to that, we would check, upon cancellation, if one of the enclosing cancel scopes had been cancelled. If not, it's a native cancellation.

@agronholm
Copy link
Owner

I think I found a way around this, but on Python 3.9 and above only: we can set a special cancellation message in the exception to detect cancel scope based cancellations.

@gvanrossum
Copy link

gvanrossum commented Feb 16, 2022

@jonathanslenders (#374 (comment)):

Related to this, the following snippet exposes very weird behavior:
[...]

I translated this example to the new (3.11) asyncio.TaskGroup idiom and it seems to work.

import asyncio

async def test():
    # Create a task group with one very long running task in here.
    async def in_task_group():
        await asyncio.sleep(100)

    try:
        async with asyncio.TaskGroup() as tg:
            tg.create_task(in_task_group())
    except asyncio.CancelledError:
        print("pass")

    # - The main function should cancel the coroutine around this point -

    # I would not expect any of the following to be printed.
    # Somehow, the task within the task group gets cancelled, but this function
    # does not get cancelled.

    print('Before sleep')
    await asyncio.sleep(2)
    print('After sleep')

async def main():
    # Create task.
    task = asyncio.create_task(test())

    # Cancel task.
    await asyncio.sleep(1)
    task.cancel()

    await task

asyncio.run(main())

(Only this is really changed:)

    try:
        async with asyncio.TaskGroup() as tg:
            tg.create_task(in_task_group())
    except asyncio.CancelledError:
        print("pass")

This produces the following output:

pass
Before sleep
After sleep

Is that what you would like to happen?

@jonathanslenders
Copy link
Contributor Author

@gvanrossum : Thanks for responding (I just see your reply). This is indeed what I'd expect. The CancelledError should propagate out of the TaskGroup.

(I'm no longer sure whether ever catching a CancelledError is a good idea though. With anyio, I prefer finally blocks that are shielded from cancellation.)

@gvanrossum
Copy link

Thanks for confirming.

@agronholm
Copy link
Owner

Fixing this is a high priority for v4.0. My initial attempts have not gone very well, as I'm seeing recursion errors from exception groups.

@gvanrossum
Copy link

Do you need my involvement? If not, I'd like to unsubscribe from this issue.

@agronholm
Copy link
Owner

Do you need my involvement? If not, I'd like to unsubscribe from this issue.

It's okay, feel free to unsubscribe.

@decaz
Copy link

decaz commented Nov 29, 2022

@agronholm are any updates on this? Hope you'll find opportunity to fix it 🙏🏻

@agronholm
Copy link
Owner

I already have, in #496. The PR includes some heavy handed overall changes to cancellation semantics, so it requires extra care and review. I myself am not done with it yet.

@reidmeyer
Copy link

bump

@agronholm
Copy link
Owner

bump

I'm not working fast enough to your tastes? I have a dozen projects to maintain, and I'll switch back to AnyIO soon-ish.

@reidmeyer
Copy link

bump

I'm not working fast enough to your tastes? I have a dozen projects to maintain, and I'll switch back to AnyIO soon-ish.

Sorry, don't mean to be rude. Have had a long day diving down the rabbit hole.

@smurfix
Copy link
Collaborator

smurfix commented Jan 24, 2023

Welcome to the rabbit warren, I'm also hitting this one. Trying to find a reasonable workaround isn't easy, and this Trio bug doesn't exactly help.

@agronholm
Copy link
Owner

This was incidentally the last bug I was working on before getting side-tracked by other projects a few weeks ago. I got stumped when playing whack-a-mole with test failures, and the solution itself was pretty complex too. Cancellation is really, really hard to get right.

@reidmeyer
Copy link

My colleagues and I faced this bug months ago, where a disconnected sse stream wasn't being handled in the exception catch. Our solution was downgraded fastapi/starlette to a 2 year old version. Hoping in a few months we'll be able to get up to date.

@agronholm
Copy link
Owner

My colleagues and I faced this bug months ago, where a disconnected sse stream wasn't being handled in the exception catch. Our solution was downgraded fastapi/starlette to a 2 year old version. Hoping in a few months we'll be able to get up to date.

Sorry for the trouble. It may not be much of a consolation, but I've been hit by this bug too occasionally, and it's my highest priority in this project right now (that is, when I get back to working on it).

@drernie
Copy link

drernie commented May 10, 2023

Any ETA on this? Anything we can do to help?

@agronholm
Copy link
Owner

The first order of business I have with AnyIO right now is getting the v3.7.0 release out. It contains a ton of other fixes that have waited a long time to be released. One of these fixes will even make tackling this issue a bit easier by reducing the amount of new code that would have to be written. I'm just about ready to make that release. After that, I can fully focus on fixing this particular issue, but I will have to scrap my previous attempt and start from scratch, as the code got pretty complicated back then and I was overwhelmed with the complexity. As for an ETA, I hope to get this done in 2-3 weeks.

@agronholm
Copy link
Owner

I've restarted my efforts based on current master. I have a test in place that reproduces the issue. I think I might start writing a fix that works on Python 3.11+ (where thanks to uncancel() we can actually track the cancellations without too many hacks). Then I could work on said hacks for earlier Pythons, as I was trying to do before going on a hiatus.

@T-256
Copy link

T-256 commented May 17, 2023

Hi @agronholm,
If you didn't start the progress, I'll be happy to help.
Are there any good points to start?

@reidmeyer
Copy link

Any luck?

@agronholm
Copy link
Owner

Happy to report that I'm making progress again, and hopefully I can make a release candidate of v4.0 this week, unless I encounter unexpectedly difficult corner cases.

@reidmeyer
Copy link

Happy to report that I'm making progress again, and hopefully I can make a release candidate of v4.0 this week, unless I encounter unexpectedly difficult corner cases.

Awesome to hear!

@agronholm
Copy link
Owner

@jonathanslenders would you mind testing the PR to make sure that it solves your original problem? Do note that the exception raising behavior has changed in a backwards incompatible fashion when you test.

@agronholm
Copy link
Owner

The merged PR should solve the problem as described. There is still the issue of cancel scopes not distinguishing between native cancellations and AnyIO cancellations on Python < 3.11. I have a PR for that in progress, but I'm running into some very subtle issues and am not yet confident about it, so that fix might not materialize in v4.0.0.

@drernie
Copy link

drernie commented Jul 17, 2023

Sweet! Any ETA on the formal v4.0.0 release?

@agronholm
Copy link
Owner

Once #591 is merged, and another upcoming minor PR related to cancellation exceptions is dealt with, I'll be ready to release v4.0.0rc1. Given the extensive backwards incompatible changes coming in that release, I want to let the release candidates stew for at least 2-3 weeks before making a final release.

@agronholm
Copy link
Owner

The expected release date for v4.0.0rc1 would be later this week.

@agronholm
Copy link
Owner

There was no release yet, but I've posted a status update in discussions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
asyncio Involves the asyncio backend design Requires a design decision
Projects
None yet
8 participants