-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[blocked by refactor] [WIP] graceful shutdown signal handling #2165
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing this it'll let me fix the logging finalize bug i'm looking into.
I think long term we need to put more thought into signal handling. Ideally, we'd have a global signal handling teardown function that closes everything nicely whether we're in the training loop, evaluate loop or in between.
I found out that in the kill signal handler, we need to call sys.exit(), otherwise the process hangs after tests completed. It seems the test suite sends one of these kill signals... |
@justusschock I noticed that the try-catch block for the KeyboardInterrupt is around almost all the code in the train method. Do you think we could simply wrap the try catch around where we call train()? |
Codecov Report
@@ Coverage Diff @@
## master #2165 +/- ##
=======================================
- Coverage 90% 87% -3%
=======================================
Files 81 81
Lines 7644 7435 -209
=======================================
- Hits 6878 6433 -445
- Misses 766 1002 +236 |
@awaelchli i think this is WIP no? |
It's finished except I don't know how to test these signals in CI and slurm. |
Hello @awaelchli! Thanks for updating this PR. There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-08-16 16:15:27 UTC |
This pull request is now in conflict... :( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🐰
@@ -364,7 +354,8 @@ def train(self): | |||
# model hooks | |||
model.on_train_start() | |||
|
|||
try: | |||
if True: # just here to enable easier merging. TODO: remove last minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not forget this one...
is still some extra docs missing? |
@Borda this is not ready to go. this pr is completely destroyed now, there are too many conflicts. I have to start from scratch and send a new PR, I'll keep it open for now so I don't forget. I will work on it very soon, I promise. |
Before submitting
What does this PR do?
Fixes #1999
Fixes #2913
maybe also fixes #2590 (need to check)
maybe also fixes #3275 (need to check)
TODO: