Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed task group getting cancelled if start() gets cancelled #717

Merged
merged 7 commits into from
May 26, 2024

Conversation

agronholm
Copy link
Owner

@agronholm agronholm commented Apr 14, 2024

This change fixes the problem by special casing the situation where the Future backing task_status was cancelled which only happens when the host task is cancelled.

Fixes #685. Fixes #710.

This change fixes the problem by special casing the situation where the Future backing `task_status` was cancelled which only happens when the host task is cancelled.

Fixes #685. Fixes #710.
@arthur-tacca
Copy link

I tried this out. I did await tg_destination.start(foo) in tg_first (just giving the task groups names here).

If I cancel tg_first before foo gets to task_status.started(), then previously it would cancel both tg_first (obviously), and the routine foo() immediately, and the tg_destination. That's the original bug. After this change, it cancels tg_first and foo(), but not tg_destination. Bug fixed, hurrah! This also matches trio.

But if I cancel tg_destination before foo gets to task_status.started() then that cancels the routine even though it's not supposed to currently be running in that task group. It also cancels tg_first and weirdly even the outer task group containing both tg_first and tg_destination. This happens with both the original and modified code. In fairness, this wasn't mentioned in the original report (sorry). Trio, on the other hand, only cancels tg_destination, and foo() continues to run until the first await after task_status.started().

@arthur-tacca
Copy link

arthur-tacca commented Apr 26, 2024

BTW in Trio, in that second case, by the time the routine foo() calls task_status.started(), the second task group has not only has cancel called but already completed (exited the async with block) - it has to be that way otherwise an arbitrary starting task that is not cancelled could hold up its cancellation for an arbitrary length of time. So perhaps there is extra magic to inject cancellation into foo() and catch it even though it is in a cancel scope that is already exited. But what if it converts that to a different exception? Is that lost? Surely Trio never loses exceptions. I think I have missed something but I don't have time to figure it out, sorry (and sorry for the ramble).

Edit: yes I missed something. Trio actually holds up the destination nursery from closing until the task has finished starting, even though the task is not cancelled, so a nursery being cancelled does indeed wait on a task that has no indication that it should cancel. See python-trio/trio#1431

@agronholm
Copy link
Owner Author

Can you at least provide a reproducing snippet of what isn't working?

@arthur-tacca
Copy link

Sorry yes, the fold below contains a repro (use of aioresult isn't necessary but I found it easier to write this way).

I think the thing I'm missing about Trio is that it actually can have bugs / incomplete design decisions! This other issue is arguably a second bug (I actually found it by looking at your changing and thinking ... how does the destination nursery find out when the new task is reparented to it? The answer is: it already was treated as a child.)

Another way to interpret my comment: this change fixes the original bug and doesn't (to my knowledge) introduce any new bugs.

Code to reproduce odd behaviours
# Test for https://github.com/agronholm/anyio/pull/717


import anyio
import aioresult


futures = [None, None]


async def startable(task_status = anyio.TASK_STATUS_IGNORED):
    print("startable(): starting")
    try:
        await anyio.sleep(1)
    except BaseException as e:
        print(f"startable(): during first sleep {e!r}")
        raise
    print("startable(): after first sleep")

    try:
        task_status.started("x")
    except BaseException as e:
        print(f"startable(): during started {e!r}")

    print("startable(): about to sleep again")
    try:
        await anyio.sleep(1)
    except BaseException as e:
        print(f"startable(): during second sleep {e!r}")
        raise
    print("startable(): after second sleep")



async def run_first():
    async with anyio.create_task_group() as tg:
        futures[0].set_result(tg)

        await futures[1].wait_done()
        other_tg = futures[1].result()
        await other_tg.start(startable)

        await anyio.sleep(1)

async def run_second():
    async with anyio.create_task_group() as tg:
        futures[1].set_result(tg)

        await anyio.sleep(2)


async def run_test(i, which_tg_cancel, when_cancel):
    futures[0] = aioresult.Future()
    futures[1] = aioresult.Future()

    async with anyio.create_task_group() as tg:
        tg.start_soon(run_first)
        tg.start_soon(run_second)

        await aioresult.wait_all(futures)
        await anyio.sleep(when_cancel)
        futures[which_tg_cancel].result().cancel_scope.cancel()

        print("before sleep")
        await anyio.sleep(0.2)
        print("after sleep")
        for i in range(2):
            c : anyio.CancelScope = futures[i].result().cancel_scope
            print(f"tg {i}: cancelled: {c.cancel_called}, caught: {c.cancelled_caught}")
    c: anyio.CancelScope = tg.cancel_scope
    print(f"tg outer: cancelled: {c.cancel_called}, caught: {c.cancelled_caught}")
    print("test complete")


async def run_all():
    print("\n\n\ntest 1: cancel 1st TG early")
    await run_test(1, 0, 0.5)

    print("\n\n\ntest 2: cancel 2nd TG early")
    await run_test(2, 1, 0.5)

    print("\n\n\ntest 3: cancel 1st TG late")
    await run_test(3, 0, 1.5)

    print("\n\n\ntest 4: cancel 2nd TG late")
    await run_test(4, 1, 1.5)

anyio.run(run_all, backend="asyncio")

@arthur-tacca
Copy link

Also, a nitpick: I found the comment confusing

The future can only be cancelled if the host task was cancelled

I assumed this referred to the reason why we're performing this test at all. So we're doing this because the host task was cancelled? Hmm.

In fact, it's assuming you already know why we're performing this test and is actually explaining how it's testing for it. Perhaps it could be preceded by a comment that says why we're performing the test (e.g., something like "check whether this task was cancelled while still waiting for it to start in TaskGroup.start() before reaching task_status.started()"). But I'm not familiar with this code base so perhaps this isn't valid for those that are.

@agronholm
Copy link
Owner Author

Also, a nitpick: I found the comment confusing

The future can only be cancelled if the host task was cancelled

I assumed this referred to the reason why we're performing this test at all. So we're doing this because the host task was cancelled? Hmm.

In fact, it's assuming you already know why we're performing this test and is actually explaining how it's testing for it. Perhaps it could be preceded by a comment that says why we're performing the test (e.g., something like "check whether this task was cancelled while still waiting for it to start in TaskGroup.start() before reaching task_status.started()"). But I'm not familiar with this code base so perhaps this isn't valid for those that are.

I've clarified the comment now, hopefully it's okay now.

@agronholm
Copy link
Owner Author

Sorry yes, the fold below contains a repro (use of aioresult isn't necessary but I found it easier to write this way).

I think the thing I'm missing about Trio is that it actually can have bugs / incomplete design decisions! This other issue is arguably a second bug (I actually found it by looking at your changing and thinking ... how does the destination nursery find out when the new task is reparented to it? The answer is: it already was treated as a child.)

Another way to interpret my comment: this change fixes the original bug and doesn't (to my knowledge) introduce any new bugs.

Code to reproduce odd behaviours

# Test for https://github.com/agronholm/anyio/pull/717


import anyio
import aioresult


futures = [None, None]


async def startable(task_status = anyio.TASK_STATUS_IGNORED):
    print("startable(): starting")
    try:
        await anyio.sleep(1)
    except BaseException as e:
        print(f"startable(): during first sleep {e!r}")
        raise
    print("startable(): after first sleep")

    try:
        task_status.started("x")
    except BaseException as e:
        print(f"startable(): during started {e!r}")

    print("startable(): about to sleep again")
    try:
        await anyio.sleep(1)
    except BaseException as e:
        print(f"startable(): during second sleep {e!r}")
        raise
    print("startable(): after second sleep")



async def run_first():
    async with anyio.create_task_group() as tg:
        futures[0].set_result(tg)

        await futures[1].wait_done()
        other_tg = futures[1].result()
        await other_tg.start(startable)

        await anyio.sleep(1)

async def run_second():
    async with anyio.create_task_group() as tg:
        futures[1].set_result(tg)

        await anyio.sleep(2)


async def run_test(i, which_tg_cancel, when_cancel):
    futures[0] = aioresult.Future()
    futures[1] = aioresult.Future()

    async with anyio.create_task_group() as tg:
        tg.start_soon(run_first)
        tg.start_soon(run_second)

        await aioresult.wait_all(futures)
        await anyio.sleep(when_cancel)
        futures[which_tg_cancel].result().cancel_scope.cancel()

        print("before sleep")
        await anyio.sleep(0.2)
        print("after sleep")
        for i in range(2):
            c : anyio.CancelScope = futures[i].result().cancel_scope
            print(f"tg {i}: cancelled: {c.cancel_called}, caught: {c.cancelled_caught}")
    c: anyio.CancelScope = tg.cancel_scope
    print(f"tg outer: cancelled: {c.cancel_called}, caught: {c.cancelled_caught}")
    print("test complete")


async def run_all():
    print("\n\n\ntest 1: cancel 1st TG early")
    await run_test(1, 0, 0.5)

    print("\n\n\ntest 2: cancel 2nd TG early")
    await run_test(2, 1, 0.5)

    print("\n\n\ntest 3: cancel 1st TG late")
    await run_test(3, 0, 1.5)

    print("\n\n\ntest 4: cancel 2nd TG late")
    await run_test(4, 1, 1.5)

anyio.run(run_all, backend="asyncio")

It's not clear to me what the expected output of this script is.

@agronholm
Copy link
Owner Author

@Zac-HD would you mind rubber stamping this PR? @arthur-tacca said above that it fixes the original bug and doesn't introduce any new ones.

Copy link
Collaborator

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, fix and test both look good to me. :shipit:

@agronholm agronholm merged commit 9f5f14b into master May 26, 2024
15 checks passed
@agronholm agronholm deleted the fix-685 branch May 26, 2024 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants