-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Catch CancelledError in scheduler_cli run function #3685
Conversation
Here's the output, when you have to press CTRL+C twice in the terminal window. 2020-07-15T11:43:55+12:00 INFO - Final point: None
2020-07-15T11:43:55+12:00 INFO - [prep.20130808T0000+12] -submit-num=01, owner@host=ranma
2020-07-15T11:43:55+12:00 INFO - [prep.20130808T0000+12] -triggered off []
^CTraceback (most recent call last):
File "/home/kinow/Development/python/workspace/cylc-flow/venv/bin/cylc-run", line 33, in <module>
sys.exit(load_entry_point('cylc-flow', 'console_scripts', 'cylc-run')())
File "/home/kinow/Development/python/workspace/cylc-flow/cylc/flow/scripts/cylc_run.py", line 22, in main
scheduler_main(is_restart=False)
File "/home/kinow/Development/python/workspace/cylc-flow/cylc/flow/scheduler_cli.py", line 446, in main
return run() # pylint: disable=E1120
File "/home/kinow/Development/python/workspace/cylc-flow/cylc/flow/terminal.py", line 209, in wrapper
wrapped_function(*wrapped_args, **wrapped_kwargs)
File "/home/kinow/Development/python/workspace/cylc-flow/cylc/flow/scheduler_cli.py", line 470, in run
return scheduler_cli(parser, options, args, is_restart=False)
File "/home/kinow/Development/python/workspace/cylc-flow/cylc/flow/scheduler_cli.py", line 362, in scheduler_cli
ret = asyncio.run(
File "/home/kinow/Development/python/anaconda3/lib/python3.8/asyncio/runners.py", line 43, in run
return loop.run_until_complete(main)
File "/home/kinow/Development/python/anaconda3/lib/python3.8/asyncio/base_events.py", line 603, in run_until_complete
self.run_forever()
File "/home/kinow/Development/python/anaconda3/lib/python3.8/asyncio/base_events.py", line 570, in run_forever
self._run_once()
File "/home/kinow/Development/python/anaconda3/lib/python3.8/asyncio/base_events.py", line 1823, in _run_once
event_list = self._selector.select(timeout)
File "/home/kinow/Development/python/anaconda3/lib/python3.8/selectors.py", line 468, in select
fd_event_list = self._selector.poll(timeout, max_ev)
KeyboardInterrupt
2020-07-15T11:43:57+12:00 INFO - [client-command] put_messages kinow@ranma:cylc-message
^CException ignored in: <module 'threading' from '/home/kinow/Development/python/anaconda3/lib/python3.8/threading.py'>
Traceback (most recent call last):
File "/home/kinow/Development/python/anaconda3/lib/python3.8/threading.py", line 1388, in _shutdown
lock.acquire()
KeyboardInterrupt: |
I see CancelledError appears like this in the Python 3.7 case:
|
Trying to reproduce this in a simple asyncio test program. I get the same result for both Python 3.7 and 3.8, which is (I think): KeyboardInterrupt gets caught at a low level, then coroutines get "cancelled" and CancelledError gets raised instead - so with asyncio we can't catch KeyboardInterrupt anymore (at a high level). The only obvious difference between 3.7 and 3.8 is:
However this doesn't quite gel with what I'm seeing in Cylc ... |
Hang on, I'm confused. I can catch
|
Yeah we do similar in |
Interesting Issue on the (now-archived) python/asyncio repo: python/asyncio#341 ("Confusing error handling for KeyboardInterrupt") Near the end, GVM himself says This thread is running off the rails... 🤣 |
But why/how does this work at all:
when you can't catch KeyboardInterrupt inside (Although, in Cylc's |
KeyboardInterrupt might be "complicated" in asyncio? E.g.: ethereum/py-evm#385 (comment) |
I suppose it hits the async loop and not the coroutine(s) it's running (?) |
Summary of "Graceful Shutdown" section of this presentation https://asyncio-wrong.herokuapp.com/#/
|
@hjoliver I think this is closer to what is happening in our case: #!/usr/bin/env python
import asyncio
# same as scheduler_cli#_run() here
async def _run():
try:
# this actually be scheduler.run(), but we just loop+sleep here
while True:
print("oi")
await asyncio.sleep(1)
except KeyboardInterrupt:
print("BYE! We got the right interrupt")
return
except:
print("HUH! Where's the KeyboardInterrupt?")
raise
def main():
# same as done in scheduler_cli() here
try:
# there's no try/except in our code here actually, only in the _run function I think
# adding just for testing
asyncio.run(_run(), debug=True)
except asyncio.exceptions.CancelledError:
print('Got a CancelledError!')
except KeyboardInterrupt:
print("Here's the KeyboardInterrupt")
except:
print('OH NO!')
if __name__ == '__main__':
main() I think the problem is that Here's what I get running it with Python 3.8: oi
oi
^CHUH! Where's the KeyboardInterrupt?
Here's the KeyboardInterrupt |
Hmmm, Sadly I don't think this is the right solution, and it looks like it's breaking I can shine a little light on the traceback:
This occurs when you have a thread which includes a So I think this is due to the threads we start in the scheduler which are:
(any I've missed?) The Proper Solution? I think we probably shouldn't be casually spawning threads in the async world, we should probably run synchronous code via a thread pool executor thinggy. This would involve removing the threading logic from the server and publisher and switching to a non-threaded version of the zmq curve authenticator. The Hacky Solution? If this is too much of a headache there might be a temporary workaround by rearranging the logic so that the |
@oliver-sanders you may be right, but I'm still curious to understand the seemingly weird behaviour of |
A variation on the example above: import asyncio
import sys
async def run():
try:
while True:
print("x", end='')
sys.stdout.flush()
await asyncio.sleep(1)
except KeyboardInterrupt:
print("KeyboardInterrupt")
raise
except asyncio.exceptions.CancelledError:
print("CancelledError")
raise
except:
print("Exception")
raise
def main():
try:
asyncio.run(run(), debug=True)
except KeyboardInterrupt:
print("Got KeyboardInterrupt")
except asyncio.exceptions.CancelledError:
print('Got a CancelledError!')
except:
print('Got Exception')
if __name__ == '__main__':
main() If I run this (Python 3.8) I get:
My guess is something like this: I guess the example above looks weird to the non-asyncio eye, but only because the hidden event loop is doing hidden things with |
@@ -417,7 +417,7 @@ async def _run(parser, options, reg, is_restart, scheduler): | |||
# stop cylc stop | |||
except SchedulerError: | |||
ret = 1 | |||
except KeyboardInterrupt as exc: | |||
except (KeyboardInterrupt, asyncio.exceptions.CancelledError) as exc: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In light of example code and discussion on this issue, we can't catch KeyboardInterrupt
here at all (whether or not catching CancelledError
is the right solution).
See the related #3564 (comment). |
I think it was from that comment, or from that issue, that I remembered we had threads somewhere. When I hit CTRL+C, there are 2 threads running, one with zmq publisher I think. The other thread the IDE was not able to open/display (could be an IDE thread maybe?). And hitting CTRL+C again, there's a threading related message. So @oliver-sanders might be on the right track. Happy to close this one once we have another solution (for the moment I'm using this patch just so I don't forget to close the workflow when looking at Cylc UI) |
I think @oliver-sanders is on the right track in terms of the other threads messing with shutdown, but we're also on the right track in that the current catching of |
It looks really strange that that exception is happening and not being caught. At least with Python 3.8. Though it's confusing for me how it was working with Python 3.7.
+1 (digressing I think) @hjoliver from your presentation linked yesterday in Element, there was a slide saying something how
If intentional we can add a comment to the code using it, and/or make it explicit with |
I think the reason for that must be as I've described above: the event loop catches KeyboardInterrupt and raises it to the top level code (that starts the event loop) but it just "cancels" the coroutines and they see CancelledError instead. |
Yes that's one of the things that we'll need to be aware of and use properly... |
Read through that presentation (it's great!), need to have a play with it 🚀
Agreed. |
e55c041
to
0e4f7f5
Compare
While this might help with #3946 , I suspect there is a better fix. Closing 👍 |
This is a small change with no associated Issue.
I am using the workflow five with Cylc UI, on Python 3.8. And it's really inconvenient having to press CTRL+C twice (and often I forget it). So I did a quick debugging before lunch to see what was going on.
Apparently, the
scheduler_cli
run function now sees aCancelledError
when I hit the CTRL+C (in PyCharm you just need to debug with the option to emulate a terminal, so that the signals are sent and trigger the debugger).(image above is where the exception was [re]raised, in the image below I added a global
except
+traceback
to see what was going on, as the first time I tried it, thefinally
block gotret=0
, even though there was an error).I think the change in Python 3.8 that is related to this issue is bpo-32528. Implemented in this commit.
The
CancelledError
would be caught in theexcept Exception
block before Python 3.8. But now it passes through thescheduler_cli
try/except error handling. Actually anyBaseException
error will do the same, andret
will be0
, which I think is not desirable (not sure, but not the main issue here I guess)?I am not sure if this is the right fix. Even though this works, it's not clear to me why I'm getting the
CancelledError
when hitting CTRL+C instead ofKeyboardInterrupt
. Just thought it could be a good starting point for others to troubleshoot and investigate this issue 👍 (and works for me as a temporary local-patch so I can continue working on Cylc UI)Requirements check-list
CONTRIBUTING.md
and added my name as a Code Contributor.