Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduler keeps running after CTRL + C (Py3.8) #3946

Closed
kinow opened this issue Nov 16, 2020 · 4 comments · Fixed by #3982
Closed

Scheduler keeps running after CTRL + C (Py3.8) #3946

kinow opened this issue Nov 16, 2020 · 4 comments · Fixed by #3982
Assignees
Labels
bug Something is wrong :(
Milestone

Comments

@kinow
Copy link
Member

kinow commented Nov 16, 2020

Describe the bug

Supersedes #3490 , which has similar symptoms. The Scheduler keeps running after a CTRL + C, and a simple cylc scan shows the workflow as running. See details below for more.

Shamelessly copying @hjoliver great explanation of the issue on Element.

First Ctrl+C says "Aborted!". The scheduler stays up, logs incoming connections, and is responsive to queries (e.g. cylc dump) BUT it is no longer responsive to commands and does not change state or submit new jobs.

(ps command on the RHS confirms that the scheduler process is still running, although that's kind of obvious from the left side).

Second Ctrl+C does kill the scheduler process, but not cleanly -it does not clean up its contact file like it does in Py 3.7.

The 2nd ps command on the right confirms that it is dead.

cylc scan still says it is running, because the minimal scan does not try to contact the running scheduler - it just reads the contact file (and assumes a contact file means scheduler running)

Then cylc scan -t rich does try to contact the scheduler, and finds that it is not running so it deletes the contact file. Any scheduler-connecting command would do the same - this is why your cylc kill five updated the scan result, not because it actually killed something.

Release version(s) and/or repository branch(es) affected?

master with Python 3.8 (works fine in 3.7)

Steps to reproduce the bug

cylc run --no-detach five
# hit CTRL +C
# then in another window
cylc scan
# five is displayed as running
cylc scan -t rich
# now five stops, as scan will check the info from the contact file

Expected behavior

Nicely exit after CTRL + C is hit the first time.

Screenshots

Additional context

Pull requests welcome!
This is an Open Source project - please consider contributing a bug fix
yourself (please read CONTRIBUTING.md before starting any work though).

@kinow kinow added the bug Something is wrong :( label Nov 16, 2020
@kinow kinow added this to the cylc-8.0a3 milestone Nov 16, 2020
@oliver-sanders
Copy link
Member

See this related Python issue logged after #3490 - https://bugs.python.org/issue39622

It may be that setting up an exception handler for KeyboardInterrupt the asyncio way provides a partial solution.

@MetRonnie
Copy link
Member

I can also reproduce on a python 3.9 conda env

@MetRonnie MetRonnie self-assigned this Dec 4, 2020
@MetRonnie
Copy link
Member

As no-one was assigned, I have a go at fixing this, unless you're already on it @kinow ?

@kinow
Copy link
Member Author

kinow commented Dec 4, 2020

Not working on this, but thanks for checking @MetRonnie :)

@hjoliver hjoliver modified the milestones: cylc-8.0a3, cylc-8.0b0 Feb 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is wrong :(
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants