Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Progress bar \ log dict items added to outputs in training_epoch_end #1727

Closed
kessido opened this issue May 4, 2020 · 15 comments
Closed

Progress bar \ log dict items added to outputs in training_epoch_end #1727

kessido opened this issue May 4, 2020 · 15 comments
Labels
bug Something isn't working help wanted Open to be worked on

Comments

@kessido
Copy link

kessido commented May 4, 2020

🐛 Bug

When running with training_step returning something along this:

return { 'loss':x,
            'some_value_for_epoch_end': y,
            'progress_bar':{'v1':z.mean()}}

you get 'v1' as part of the outputs in epoch end. This is unexpected imo.

Also in case it is:

return { 'loss':x,
            'some_value_for_epoch_end': y,
            'progress_bar':{'some_value_for_epoch_end': z.mean()}}

you get 'some_value_for_epoch_end' = z.mean(), as the values get overwritten.

It originate from file trainer/logging.py
lines 172, 173, where you overwrite the values using progress_bar_outputs + log_outputs

@kessido kessido added bug Something isn't working help wanted Open to be worked on labels May 4, 2020
@github-actions
Copy link
Contributor

github-actions bot commented May 4, 2020

Hi! thanks for your contribution!, great first issue!

@olineumann
Copy link
Contributor

Isn't it a feature?
https://pytorch-lightning.readthedocs.io/en/0.7.5/experiment_reporting.html#modify-progress-bar

I would await to see 'v1': SomeValue at the progress bar output after what I see in the docs.
Or do you mean you wan't only to see the value without 'v1'?

@kessido
Copy link
Author

kessido commented May 4, 2020

Might be, but I looked at the progress bar, and log, as something that goes to the "back end".

The "overwriting" property is a but I believe.
Lets say someone is trying to aggregate "acc" for the training_epoch_end phases, but he also like to report "acc" in the progress bar.

"acc1" is of size [batch_size, 1], while "acc2" is of size [1], just a scalar.

When arriving to training_epoch_end, he well see all outputs have only the [1] shaped outputs, instead of the [batch_size, 1] ones.

@olineumann
Copy link
Contributor

olineumann commented May 4, 2020

Ah okay. Now I understand. Do you have runnable example code or notebook for that issue?
The logging class returning each dict but also a complete callback_metric dict. See:
https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/trainer/logging.py#L171-L179

So I think you could access all metric dicts by trainer.some_dict (progress_bar_metrics, callback_metrics). The question is I think if there is any value overwritten and gone forever. Currently I think it isn't.

EDIT:
I think that returning the following dict would overwrite it without getting it back:

{
    'loss': loss,
    'value': 1,
    'progress_bar': { 'value': 2}
}

where callback_metrics['value'] == 2. Same when passing 'value' to logs. Don't know if this case should be handled but I think it could lead to bugs. Some ideas how it could be solved:

  1. Do not collect log and progress bar metrics in callback_metrics (but in the comments there is written it should...)
  2. Leave the log and progress_bar logs in different sub dicts (like callback_metrics['progress_bar']['value']. Don't know if this would lead to other errors.
  3. Give it priorities and add it to the docs. Like first prior would have the callback_metric, then logs, then progress_bar (like {'value': 1, 'progress_bar': {'value': 3}, 'logs': {'value': 2}})
  4. ?

Think this should be discussed more.

@kessido
Copy link
Author

kessido commented May 4, 2020

Hey here is an example:
https://colab.research.google.com/drive/1kYjEH2h5Ki9ygZ0XlA6hpbcxA1L1NNzb?usp=sharing

IMO training output should contain only values that will be used at the epoch end step, and not contain progress_bar \ log values, so this is a valid solution. At least I expected it to be this way and was considerably surprised when it didn't.

@olineumann
Copy link
Contributor

@kessido Good example! This could be normal usage. I would wait for other opinions and ideas. Maybe I try solution 2 above the next days and check which tests will fail or which classes will be affected. But I think this could be a little bit harder.

Another quick solution would be to append a prefix to log and progress_bar dict values. This wouldn't affect other classes I think but could lead also to collisions.

Let's see what others will say or what ideas they have.

@awaelchli
Copy link
Contributor

I agree, the log and progress bar dict should not end up in epoch_end. If one really wants to collect a metric from these subdicts in epoch end, then one should simply store a copy of it in the top-level dict.

Unrelated, but noticed in your colab: You have a acc key in training_step and an acc_key in epoch end for the logger. This is not good, since this will affect the global_step in the logger and increase it by 1 after every epoch. You should rename it to "epoch_acc" or something like that.

@kessido
Copy link
Author

kessido commented May 6, 2020

I agree, the log and progress bar dict should not end up in epoch_end. If one really wants to collect a metric from these subdicts in epoch end, then one should simply store a copy of it in the top-level dict.

Unrelated, but noticed in your colab: You have a acc key in training_step and an acc_key in epoch end for the logger. This is not good, since this will affect the global_step in the logger and increase it by 1 after every epoch. You should rename it to "epoch_acc" or something like that.

Haha, nice catch! 😆
I usually call them with "train_x" or "val_x"~ but I just wrote it as an example :)

@olineumann
Copy link
Contributor

olineumann commented May 12, 2020

If no one else responding I'll add a PR for that. The simplest solution I tested was just remove adding log and progress bar dict to callback metric. I had to change the base model validation step in tests to pass.

@awaelchli
Copy link
Contributor

hey awesome, I will review PR as soon as I have time! Thanks for your effort

@Borda
Copy link
Member

Borda commented May 12, 2020

@awaelchli are you preparing another pr then the ref #1800?

@awaelchli
Copy link
Contributor

No, if @olineumann's pr addresses the issue, it should not be necessary

@stale
Copy link

stale bot commented Jul 12, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the won't fix This will not be worked on label Jul 12, 2020
@awaelchli awaelchli removed the won't fix This will not be worked on label Jul 12, 2020
@edenlightning
Copy link
Contributor

@williamFalcon is this fixed with structured results?

@williamFalcon
Copy link
Contributor

yes! fixed with structured results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants