-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stopping the daemon always seems to hit a timeout #2963
Comments
One comment - I don't know how hard it is to start/stop the daemon from the python API but if it is not too hard, then a very powerful test to be added (that would at least catch one of the potential issues you mention) would be to
|
The test you propose might be difficult to implement and would not test the actual problem of this issue:
The problem actually stems from process functions. They create their own runner instance, in order not to get blocked for nested process functions, and they also attach their own handlers for interrupt signals to kill the process. This last part is important that if you run one in a local interpreter and then press CTRL+C, the AiiDA process is also properly killed and not just the python process. Otherwise you would end up with a process node that is still The thing that causes the problem described in this issue arises due to an error in the attaching of interrupt signal handlers of process functions. They are attached, but never deattached. That means that each and everyone of those will be called when the daemon runner is asked to stop, even after the functions have long since finished. On top of that, the logic in the handler is incorrect and so will hang because the process no longer exists. This is ultimately what causes the |
Fixed in PR #2966 |
When calling
verdi daemon stop
consistently the command times out. Eventually, the circus process will be killed. This behavior is very recent and it might have to do with the recent PR #2744 which dealt with attempting to killProcess
instances when a local runner was interrupted. Now when the daemon is shutdown, the workers receive the interrupt signal, triggering the code that was added in #2744, where they try to kill the processes that were run. However, since these processes are not un-registered once they are completed, the daemon will try to kill processes that are already finished, which will cause the "hanging".The text was updated successfully, but these errors were encountered: