Progress bar \ log dict items added to outputs in training_epoch_end #1727

kessido · 2020-05-04T09:47:57Z

🐛 Bug

When running with training_step returning something along this:

return { 'loss':x,
            'some_value_for_epoch_end': y,
            'progress_bar':{'v1':z.mean()}}

you get 'v1' as part of the outputs in epoch end. This is unexpected imo.

Also in case it is:

return { 'loss':x,
            'some_value_for_epoch_end': y,
            'progress_bar':{'some_value_for_epoch_end': z.mean()}}

you get 'some_value_for_epoch_end' = z.mean(), as the values get overwritten.

It originate from file trainer/logging.py
lines 172, 173, where you overwrite the values using progress_bar_outputs + log_outputs

The text was updated successfully, but these errors were encountered:

github-actions · 2020-05-04T09:48:40Z

Hi! thanks for your contribution!, great first issue!

olineumann · 2020-05-04T17:21:15Z

Isn't it a feature?
https://pytorch-lightning.readthedocs.io/en/0.7.5/experiment_reporting.html#modify-progress-bar

I would await to see 'v1': SomeValue at the progress bar output after what I see in the docs.
Or do you mean you wan't only to see the value without 'v1'?

kessido · 2020-05-04T19:50:18Z

Might be, but I looked at the progress bar, and log, as something that goes to the "back end".

The "overwriting" property is a but I believe.
Lets say someone is trying to aggregate "acc" for the training_epoch_end phases, but he also like to report "acc" in the progress bar.

"acc1" is of size [batch_size, 1], while "acc2" is of size [1], just a scalar.

When arriving to training_epoch_end, he well see all outputs have only the [1] shaped outputs, instead of the [batch_size, 1] ones.

olineumann · 2020-05-04T20:38:12Z

Ah okay. Now I understand. Do you have runnable example code or notebook for that issue?
The logging class returning each dict but also a complete callback_metric dict. See:
https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/trainer/logging.py#L171-L179

So I think you could access all metric dicts by trainer.some_dict (progress_bar_metrics, callback_metrics). The question is I think if there is any value overwritten and gone forever. Currently I think it isn't.

EDIT:
I think that returning the following dict would overwrite it without getting it back:

{
    'loss': loss,
    'value': 1,
    'progress_bar': { 'value': 2}
}

where callback_metrics['value'] == 2. Same when passing 'value' to logs. Don't know if this case should be handled but I think it could lead to bugs. Some ideas how it could be solved:

Do not collect log and progress bar metrics in callback_metrics (but in the comments there is written it should...)
Leave the log and progress_bar logs in different sub dicts (like callback_metrics['progress_bar']['value']. Don't know if this would lead to other errors.
Give it priorities and add it to the docs. Like first prior would have the callback_metric, then logs, then progress_bar (like {'value': 1, 'progress_bar': {'value': 3}, 'logs': {'value': 2}})
?

Think this should be discussed more.

kessido · 2020-05-04T21:25:47Z

Hey here is an example:
https://colab.research.google.com/drive/1kYjEH2h5Ki9ygZ0XlA6hpbcxA1L1NNzb?usp=sharing

IMO training output should contain only values that will be used at the epoch end step, and not contain progress_bar \ log values, so this is a valid solution. At least I expected it to be this way and was considerably surprised when it didn't.

olineumann · 2020-05-05T10:19:18Z

@kessido Good example! This could be normal usage. I would wait for other opinions and ideas. Maybe I try solution 2 above the next days and check which tests will fail or which classes will be affected. But I think this could be a little bit harder.

Another quick solution would be to append a prefix to log and progress_bar dict values. This wouldn't affect other classes I think but could lead also to collisions.

Let's see what others will say or what ideas they have.

awaelchli · 2020-05-06T03:08:49Z

I agree, the log and progress bar dict should not end up in epoch_end. If one really wants to collect a metric from these subdicts in epoch end, then one should simply store a copy of it in the top-level dict.

Unrelated, but noticed in your colab: You have a acc key in training_step and an acc_key in epoch end for the logger. This is not good, since this will affect the global_step in the logger and increase it by 1 after every epoch. You should rename it to "epoch_acc" or something like that.

kessido · 2020-05-06T18:13:48Z

I agree, the log and progress bar dict should not end up in epoch_end. If one really wants to collect a metric from these subdicts in epoch end, then one should simply store a copy of it in the top-level dict.

Unrelated, but noticed in your colab: You have a acc key in training_step and an acc_key in epoch end for the logger. This is not good, since this will affect the global_step in the logger and increase it by 1 after every epoch. You should rename it to "epoch_acc" or something like that.

Haha, nice catch! 😆
I usually call them with "train_x" or "val_x"~ but I just wrote it as an example :)

olineumann · 2020-05-12T14:30:18Z

If no one else responding I'll add a PR for that. The simplest solution I tested was just remove adding log and progress bar dict to callback metric. I had to change the base model validation step in tests to pass.

awaelchli · 2020-05-12T14:37:31Z

hey awesome, I will review PR as soon as I have time! Thanks for your effort

Borda · 2020-05-12T21:12:00Z

@awaelchli are you preparing another pr then the ref #1800?

awaelchli · 2020-05-13T04:00:03Z

No, if @olineumann's pr addresses the issue, it should not be necessary

stale · 2020-07-12T04:22:07Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

edenlightning · 2020-08-03T21:59:28Z

@williamFalcon is this fixed with structured results?

williamFalcon · 2020-08-06T12:02:31Z

yes! fixed with structured results

kessido added bug Something isn't working help wanted Open to be worked on labels May 4, 2020

olineumann mentioned this issue May 12, 2020

Callback Metric Dict getting overwritten by Log and Progress Bar Dict #1800

Closed

5 tasks

stale bot added the won't fix This will not be worked on label Jul 12, 2020

awaelchli removed the won't fix This will not be worked on label Jul 12, 2020

williamFalcon closed this as completed Aug 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Progress bar \ log dict items added to outputs in training_epoch_end #1727

Progress bar \ log dict items added to outputs in training_epoch_end #1727

kessido commented May 4, 2020 •

edited

Loading

github-actions bot commented May 4, 2020

olineumann commented May 4, 2020

kessido commented May 4, 2020

olineumann commented May 4, 2020 •

edited

Loading

kessido commented May 4, 2020

olineumann commented May 5, 2020

awaelchli commented May 6, 2020

kessido commented May 6, 2020

olineumann commented May 12, 2020 •

edited

Loading

awaelchli commented May 12, 2020

Borda commented May 12, 2020

awaelchli commented May 13, 2020

stale bot commented Jul 12, 2020

edenlightning commented Aug 3, 2020

williamFalcon commented Aug 6, 2020

Progress bar \ log dict items added to outputs in training_epoch_end #1727

Progress bar \ log dict items added to outputs in training_epoch_end #1727

Comments

kessido commented May 4, 2020 • edited Loading

🐛 Bug

github-actions bot commented May 4, 2020

olineumann commented May 4, 2020

kessido commented May 4, 2020

olineumann commented May 4, 2020 • edited Loading

kessido commented May 4, 2020

olineumann commented May 5, 2020

awaelchli commented May 6, 2020

kessido commented May 6, 2020

olineumann commented May 12, 2020 • edited Loading

awaelchli commented May 12, 2020

Borda commented May 12, 2020

awaelchli commented May 13, 2020

stale bot commented Jul 12, 2020

edenlightning commented Aug 3, 2020

williamFalcon commented Aug 6, 2020

kessido commented May 4, 2020 •

edited

Loading

olineumann commented May 4, 2020 •

edited

Loading

olineumann commented May 12, 2020 •

edited

Loading