-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Callback Metric Dict getting overwritten by Log and Progress Bar Dict #1800
Callback Metric Dict getting overwritten by Log and Progress Bar Dict #1800
Conversation
This pull request is now in conflict... :( |
@olineumann Thanks for the PR. Is it correct that the bug only exists for training_epoch end, not for the valid/test_epoch_end? In this case, could you check that your change brings it in line with validation_step/epoch_end.
I think consensus is that we want to do early stopping only on validation metrics, and no more on training metrics as it is right now. #1458 is dealing with this. |
@awaelchli No it should effect train, validation and test epoch end. See my changes on validation epoch end of the base model in the tests. |
tests/base/model_valid_epoch_ends.py
Outdated
@@ -43,5 +43,7 @@ def _mean(res, key): | |||
val_acc_mean /= len(outputs) | |||
|
|||
metrics_dict = {'val_loss': val_loss_mean.item(), 'val_acc': val_acc_mean.item()} | |||
results = {'progress_bar': metrics_dict, 'log': metrics_dict} | |||
return results | |||
result = metrics_dict.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the copy here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without copying the metric dict, result and metric dict reference the same object. So adding metric dict to result['progress_bar'] would also change metric_dict. If then adding metric dict to result['log'], result['log']['progress_bar'] would exist and cause errors in tests in my machine.
First I reused the metric_dict by
metric_dict['progress_bar'] = metric_dict
metric_dict['log'] = metric_dict
return metric_dict
But this is wrong and leads to the same error.
@@ -168,10 +168,6 @@ def process_output(self, output, train=False): | |||
# --------------- | |||
hiddens = output.get('hiddens') | |||
|
|||
# use every metric passed in as a candidate for callback | |||
callback_metrics.update(progress_bar_metrics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to remove this?
without this log metrics and progress bar metrics won't be candidates for the callbacks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #1727 @kessido had the issue, that progress bar or log metric overwrites the callback metric of the top layer dict. An example was also given by @kessido see COLAB
I don't know if this needs to be fixed, that's why I asked in the issue for more opinions. Only @awaelchli responded and said he thinks that this also needs to be fixed.
Because no one started a PR I did to initiate a discussion. I have several ideas on how this could be fixed and mentioned some in the issue above. But this was the easiest and quickest solution. I didn't want to spend too much afford on a solution which then will be discarded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@williamFalcon @olineumann, in the current update, when that line is removed and we use Result Obj, we cannot save the model checkpoint in form of {val_loss}
, it will result epoch=1-val_loss=0
which cannot get the val_loss
due to the filename params based on the callback_metrics
. Is there another way to assign Result/TrainResul/EvalResult Obj with callback_metrics.
This pull request is now in conflict... :( |
4e24924
to
dde55a8
Compare
Codecov Report
@@ Coverage Diff @@
## master #1800 +/- ##
========================================
- Coverage 89% 86% -3%
========================================
Files 79 78 -1
Lines 7302 4919 -2383
========================================
- Hits 6514 4231 -2283
+ Misses 788 688 -100 |
This pull request is now in conflict... :( |
@olineumann mind check last comments? it would be great to get this done 🐰 |
Hey Borda, thanks for replying. I responded to the last comments on the code reviews. I still not sure what the best way would be to solve the problem. Because the current fix would affect many users which will lead to many issues I think from users complaining their logging or early stopping won't work anymore. I could implement that only metric values from progress bar or logging would be written to the top dict if the key didn't exists already. That wouldn't affect so much users I think. I hoped that there were more opinions on that. I could implement the solution above, rebase and push so it could be merged. |
dde55a8
to
1c6bfaa
Compare
Hello @olineumann! Thanks for updating this PR. There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-08-06 12:01:29 UTC |
@Borda Just rebased to master, implemented and pushed the solution, and all tests passing 🍻. Now the logging and progress bar metric values are only written to the top-level callback metric dict if the key didn't exist. Also at first, the logging values are written, and then the progress bar values (so logging metric values have a higher priority if both containing the same key). This shouldn't affect other users' code as long as they didn't use the same key in different metric dicts. |
just thinking that it may be also solved by #1989 what do you think? |
Didn't saw the PR before. Currently, I have not so much time to follow pytorch_lightning... But looks like a nice new feature! I think when using the new way by passing a Result() object it is already solved. But currently the PR isn't done yet so the old way will still be used and also, as far I understand, the old way should still be supported. So I think this PR could be merged into master to fix #1727 (which wouldn't be fixed by #1989 as long as the user switches to the new result object). |
This pull request is now in conflict... :( |
4 similar comments
This pull request is now in conflict... :( |
This pull request is now in conflict... :( |
This pull request is now in conflict... :( |
This pull request is now in conflict... :( |
@olineumann how is it going? can we finish it soon... |
…ecting other users code). Moved CHANGELOG.md entry to unreleased section.
f14fb59
to
a1c0da6
Compare
this was solved in the structured results refactors |
Before submitting
Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?
If you made a notable change (that affects users), did you update the CHANGELOG?
What does this PR do?
Fixes #1727.
Dict values passed to progress bar or log overwriting callback values. See example in issue.
There are several options to solve it. This simply removes adding progress bar and log values to callback dict. Tests passed on my machine.
But this will affect users code e.g. when log metric as early stopping metric was used
PR review
Opinions, other solutions, recommendations, ... welcome! Also help in updating docs.
Did you have fun?
🙃