-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ModelCheckpoint period #3630
Conversation
Hello @carmocca! Thanks for updating this PR.
Comment last updated at 2020-09-29 13:04:12 UTC |
Ignoring this because https://www.flake8rules.com/rules/W503.html says: "line breaks should occur before the binary operator because it keeps all operators aligned." |
Codecov Report
@@ Coverage Diff @@
## master #3630 +/- ##
========================================
- Coverage 86% 84% -2%
========================================
Files 110 110
Lines 8129 9352 +1223
========================================
+ Hits 6963 7812 +849
- Misses 1166 1540 +374 |
please, if you don't mind, could you split the PR in two? as I understand, these are 2 independent features/fixes? |
Yes they are independent. I joined them because they are easier to test together. I'll split them |
If you think it is best to have the test combined, you may prioritize one PR, get it reviewed and merged, then focus on the other integrating the test. if that's possible |
Okay, this is ready again. Blocked on 3633 because i'm using |
This pull request is now in conflict... :( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
solid.
I verified that the test fails on master.
This pull request is now in conflict... :( |
d79bce1 destroyed this PR so easiest thing was to rebase, undo previous commits and redo them. Should be ready to go again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
anything substantial changed since we last reviewed? |
The only additional change is: which I think is a bug |
why not >= 1? In order to track the top 1, we need to have a monitor, no? |
Yes but if we change that, the default ModelCheckpoint parameters are incorrect because monitor is None by default but top_k is 1 (and would fail in that check) |
monitor is by default None. To save ALL checkpoints, then top_k=-1. by default we will only save the last checkpoint (otherwise you WILL blow up your disk) |
Im okay with that. Just one thing: If default behaviour does the same as save_last, why not have save_last=True by default to make it obvious? |
because save_last is a separate operation... We have these modes: When not in default, user CAN save last AND top k or only one of them. Thus each one is its own behavior that must also be turned on. Once the user leaves default mode, they'd be very surprised to continue finding the last ckpts saved |
Then maybe we should throw a warning: if save_last and monitor is None and save_top_k is not None:
rank_zero_warn(
"ModelCheckpoint save_last is set to True but the last checkpoint is already saved"
f" by having monitor=None and save_top_k={save_top_k}"
) (or a similar warning) |
this is unnecessarily complicated... save_top_k is NOT the same as save_last... the last ckpt may not be the best one (ie: overfitting) We're going to keep the current behavior with the only change which is top_k defaults to None. Again... to be clear: If the user sets to save the top k then they can ALSO opt in to saving the last by enabling that. I understand what you're saying, but for a user to get the last when they set top k is very confusing... unless they also ask for the last. Maybe the answer is to ALSO set save_last=None by default. |
Of course it is not always, but the default values of ModelCheckpoint That means that Thinking as a new user, I would be a bit confused about this. The warning I propose is for the case where |
I think so, considering that |
working on the default savetopk here #3680 |
@awaelchli let's make save_last = None by default as well? |
This pull request is now in conflict... :( |
This pull request is now in conflict... :( |
Anything holding this back? |
This pull request is now in conflict... :( |
Already pushed to master This reverts commit 00d9e77.
it shall be fine now... |
What does this PR do?
Fixes #3619
Before submitting