-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
job polling broken for failed jobs after restart #4516
Comments
I can't test this with Cylc 8 at the moment, however, I expect the bug will likely be present there too. |
The solution is presumably to update the |
Cylc 8 issue - #4513 |
#4513 is really a different issue (polling doing the wrong thing). I've confirmed that this remains an issue at Cylc 8. |
Closed by #5016 |
tldr;
Failed tasks can be polled back to incorrect states on restart.
Bug:
After a restart Cylc updates task proxies with the
owner@host
pair of submitted/running jobs to allow polling:cylc-flow/lib/cylc/task_pool.py
Lines 361 to 369 in 5ef4419
This, however, excludes succeeded and failed tasks. Consequently, following restart remote tasks do not have their
owner@host
loaded from the DB which causes polling to run locally.Polling will most likely fail but could also produce unexpected results (particularly for the case of background jobs).
This may be related to #1792 which extended polling to succeeded / failed tasks but didn't extend the
owner@host
update logic:https://github.com/cylc/cylc-flow/pull/2396/files#diff-1f1aa9b850f9d1655a22322beb0e2d0604fb816b3bc807210120547f1a35ae24
When this effect is combined with a task failing by hitting execution time limit on a remote batch system (that is not pollable locally) this causes the task to be polled back to running.
Reproducible Example:
Log Snippet (post-restart):
Pull requests welcome!
This is an Open Source project - please consider contributing a bug fix
yourself (please read
CONTRIBUTING.md
before starting any work though).The text was updated successfully, but these errors were encountered: