Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hang in event.wait() with current master #86

Closed
njsmith opened this issue Oct 17, 2016 · 5 comments
Closed

Hang in event.wait() with current master #86

njsmith opened this issue Oct 17, 2016 · 5 comments

Comments

@njsmith
Copy link
Contributor

njsmith commented Oct 17, 2016

Hi, me again...

With curio 0.5, this code works as expected. With current master, it hangs mysteriously:

import curio

event1 = curio.Event()
event2 = curio.Event()

async def f1():
    print("f1: 1")
    await event1.set()
    print("f1: 2")
    await event2.wait()    # <--- this never returns
    print("f1: 3")

async def f2():
    print("f2: 1")
    await event1.wait()
    print("f2: 2")
    await event2.set()
    print("f2: 3")

async def main():
    print("main: 1")
    for task in [await curio.spawn(f1()), await curio.spawn(f2())]:
        print("main: 2")
        await task.join()
        print("main: 3")
    print("main: 4")

curio.run(main(), with_monitor=True)

Output with 0.5:

main: 1
f1: 1
f1: 2
f2: 1
f2: 2
main: 2
f1: 3
f2: 3
main: 3
main: 2
main: 3
main: 4

Output with 7eb64a2:

main: 1
f1: 1
f1: 2
f2: 1
f2: 2
main: 2
f2: 3
# then hangs until killed

(Encountered while writing a test for #85, which is why it doesn't have tests yet.)

@njsmith
Copy link
Contributor Author

njsmith commented Oct 17, 2016

Ah, it's a side-effect of the excessively rather clever code for handling Event.set from sync context.

  1. f1 does await event2.wait()
  2. at the top of wait it checks and [sees that self._set is false], so it decides to block on the queue
  3. then it does this:
    self._reschedule_func = await _queue_reschedule_function(self._waiting)
  4. the await _queue_reschedule_function traps to the kernel... and the kernel comes back to f2, not f1.
  5. f2 calls event2.set(), at a time when f1 is stuck half-way into event2.wait(), so it doesn't go well. f1's not on the queue yet, so it can't be resumed, but when it eventually wakes up it immediately suspends on the queue.

I'd submit a patch but I'm not entirely sure what the best fix is. I guess the obvious fix would be to make sure that _queue_reschedule_function always resumes the original task without rescheduling. This code looks kinda dicey all around though -- do we really need two independent implementations of the set/wait logic? And @awaitable -- really? I have read the source I know what you did there :-). And more to the point, if one's going to go to the trouble of making a synchronous API that works, then why not just use that everywhere? Even from async context, a sychronous API is better because it removes a possible yield point. (And as we've just seen, those yield points are sneaky and can cause trouble!)

@dabeaz
Copy link
Owner

dabeaz commented Oct 17, 2016

Interesting. This is something experimental that I've been playing around with. I might take it out again. A lot of things in Curio are experimental ;-).

@dabeaz
Copy link
Owner

dabeaz commented Oct 17, 2016

Patching _queue_reschedule_function to come back to the original task should be easy--there are other kernel functions that do that.

@dabeaz
Copy link
Owner

dabeaz commented Oct 17, 2016

Just pushed a possible fix for this.

@njsmith
Copy link
Contributor Author

njsmith commented Oct 27, 2016

The TLS tests are passing, so I guess we can close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants