-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Progress bar \ log dict items added to outputs in training_epoch_end #1727
Comments
Hi! thanks for your contribution!, great first issue! |
Isn't it a feature? I would await to see 'v1': SomeValue at the progress bar output after what I see in the docs. |
Might be, but I looked at the progress bar, and log, as something that goes to the "back end". The "overwriting" property is a but I believe. "acc1" is of size [batch_size, 1], while "acc2" is of size [1], just a scalar. When arriving to training_epoch_end, he well see all outputs have only the [1] shaped outputs, instead of the [batch_size, 1] ones. |
Ah okay. Now I understand. Do you have runnable example code or notebook for that issue? So I think you could access all metric dicts by trainer.some_dict (progress_bar_metrics, callback_metrics). The question is I think if there is any value overwritten and gone forever. Currently I think it isn't. EDIT: {
'loss': loss,
'value': 1,
'progress_bar': { 'value': 2}
} where callback_metrics['value'] == 2. Same when passing 'value' to logs. Don't know if this case should be handled but I think it could lead to bugs. Some ideas how it could be solved:
Think this should be discussed more. |
Hey here is an example: IMO training output should contain only values that will be used at the epoch end step, and not contain progress_bar \ log values, so this is a valid solution. At least I expected it to be this way and was considerably surprised when it didn't. |
@kessido Good example! This could be normal usage. I would wait for other opinions and ideas. Maybe I try solution 2 above the next days and check which tests will fail or which classes will be affected. But I think this could be a little bit harder. Another quick solution would be to append a prefix to log and progress_bar dict values. This wouldn't affect other classes I think but could lead also to collisions. Let's see what others will say or what ideas they have. |
I agree, the log and progress bar dict should not end up in epoch_end. If one really wants to collect a metric from these subdicts in epoch end, then one should simply store a copy of it in the top-level dict. Unrelated, but noticed in your colab: You have a acc key in training_step and an acc_key in epoch end for the logger. This is not good, since this will affect the global_step in the logger and increase it by 1 after every epoch. You should rename it to "epoch_acc" or something like that. |
Haha, nice catch! 😆 |
If no one else responding I'll add a PR for that. The simplest solution I tested was just remove adding log and progress bar dict to callback metric. I had to change the base model validation step in tests to pass. |
hey awesome, I will review PR as soon as I have time! Thanks for your effort |
@awaelchli are you preparing another pr then the ref #1800? |
No, if @olineumann's pr addresses the issue, it should not be necessary |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
@williamFalcon is this fixed with structured results? |
yes! fixed with structured results |
🐛 Bug
When running with training_step returning something along this:
you get 'v1' as part of the outputs in epoch end. This is unexpected imo.
Also in case it is:
you get 'some_value_for_epoch_end' = z.mean(), as the values get overwritten.
It originate from file trainer/logging.py
lines 172, 173, where you overwrite the values using progress_bar_outputs + log_outputs
The text was updated successfully, but these errors were encountered: