better checking of data returned from training_step #1256

jeremyjordan · 2020-03-27T02:51:27Z

🚀 Feature

let's add more validation checks on what's returned from training_step and provide the user with useful error messages when they're not returning the right values.

Motivation

i feel like i've seen a lot of users confused about what they're supposed to return in training_step and validation_step. additionally, i don't think we document how we treat extra keys as "callback metrics" very well.

Pitch

what you do you think about adding some structure and validation for Trainer's process_output method?

right now, we have expectations about a set of keys {progress_bar, log, loss, hiddens} and assume everything else is a callback metric. however, this is a silent assumption.

we could instead enforce a more rigid structure:

{
  'loss': loss                   # REQUIRED
  'log': {}                         # optional dict
  'progress_bar': {}       # optional dict
  'hiddens': [h0, c0]     # optional collection of tensors
  'metrics': {}                 # optional dict
}

moreover, we can leverage pydantic to do validation automatically and provide useful error message out of the box when data validation fails.

cc @PyTorchLightning/core-contributors

Alternatives

Do nothing, keep things as they are.

Additional context

This would be a backwards incompatible change.

The text was updated successfully, but these errors were encountered:

rotalex · 2020-03-27T20:43:36Z

I would like to work on this.

Borda · 2020-03-27T21:06:16Z

@rotalex cool, looking forward seeing a PR from you :]

jeremyjordan · 2020-03-28T22:33:45Z

@Borda given that this proposal is backwards compatible, i think we should get more core contributors to weigh in on the proposed design before moving forward and implementing it.

one thing that is still giving me tension is the fact that there's a lot of overlap between log, progress_bar, and metrics. progress_bar almost always consists of a subset of log, and metrics (or as they currently stand, arbitrary keys) are typically used to store temporary values to be collated and logged at the end of an epoch. i think there's room for improvement here.

Borda · 2020-03-28T22:48:25Z

@jeremyjordan good point! we had recently an issue about "why there are two dicts - one for the progress bar and the general while it has the same values" so some simplification or more structured way would be welcome...
cc: @PyTorchLightning/core-contributors ^^

williamFalcon · 2020-03-29T02:39:13Z

log and progress_bar were separated a while back because people wanted to log different things that they didn't want in the progress bar

williamFalcon · 2020-03-29T02:39:37Z

i don't really know what metrics is.

jeremyjordan · 2020-03-29T13:10:31Z

the usage for log and progress_bar keys is clear. however, if you look at the method for process output you'll see

# ---------------
# EXTRACT CALLBACK KEYS
# ---------------
# all keys not progress_bar or log are candidates for callbacks
callback_metrics = {}
for k, v in output.items():
    if k not in ['progress_bar', 'log', 'hiddens']:
        callback_metrics[k] = v

if train and (self.use_dp or self.use_ddp2):
    num_gpus = self.num_gpus
    callback_metrics = self.reduce_distributed_output(callback_metrics, num_gpus)

for k, v in callback_metrics.items():
    if isinstance(v, torch.Tensor):
        callback_metrics[k] = v.item()

all keys not progress_bar or log are candidates for callbacks

as far as i know, this isn't documented anywhere.

if you look in the documentation, however, you will see references to keys which are not included in the set of {loss, log, progress_bar} but the only hint about how to use them is through the examples we provided (eg. val_loss below)

class LitModel(pl.LightningModule):
    def validation_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self.forward(x)
        return {'val_loss': F.cross_entropy(y_hat, y)}

    def validation_epoch_end(self, outputs):
        val_loss_mean = torch.stack([x['val_loss'] for x in outputs]).mean()
        return {'val_loss': val_loss_mean}

now that i think about this further, a better solution might be to add more detail to the documentation about how we collect outputs from training steps and expose the outputs at the end of epochs to make this more clear. furthermore, we should document that if you're returning a torch tensor, we expect that it is a scalar value.

the second question is whether or not we want to do more explicit validation of data returned by the user. the motivation for this github issue is #1236 where we could help the user more quickly track down the source of an error. imagine if the error raised to the user was instead ValidationError: Cannot reduce key 'val_loss' to a scalar.

Borda · 2020-03-29T21:10:16Z

I would be in favour of more rigid structure, also may check #1277

gabisurita · 2020-04-30T12:58:11Z

Shouldn't we favor the return type to be a strong type? I've always wondered why the step return type is not a dataclass or named tuple where loss is a required argument. We could keep the flexibility using some metadata dict argument.

williamFalcon · 2020-04-30T13:06:17Z

i wouldn't mind stronger typing, but i don't want to start adding APIs to remember?
although it might be simpler to remember the structured type instead of the possible keys?

@tullie @ashwinb @Darktex thoughts?

A potential way to add structure is (something like this):

def training_step(...):
      output = pl.StepResult(loss=loss, logs=logs, progress_bar=progress_bar)
      return output

I guess the only this helps with is that the user doesn't have to remember what the keys are in the dict?

Pro

Removes confusion with what keys do what in the return

Con

Adds an API users have to remember
(although you could argue that remembering to put "loss" in a dict is just as bad)

tullie · 2020-04-30T20:51:05Z

My preference would be a StepResult class. I think it's the best way to document what the outputs should be. We can still support returning a dictionary and just build the StepResult class from the dict on the trainer side anyway. It'd be great if we could create a unified way for callbacks to specify which arguments are required in the StepResult too.

The progress_bar, log overlap that @jeremyjordan brought up isn't idea. I'd love to hear how others think we should address this? The best I could come up with is specifying the desired keys for log and progress bar somewhere in LightningModule init or as a trainer callback argument. The user would then just put all result values in the step result dictionary and the specified keys would be found for the respective outputs (log and/or progress bar).

stale · 2020-06-30T12:39:57Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

jeremyjordan added feature Is an improvement or enhancement help wanted Open to be worked on labels Mar 27, 2020

Borda added the good first issue Good for newcomers label Mar 27, 2020

Borda assigned rotalex Mar 27, 2020

Borda removed the help wanted Open to be worked on label Mar 27, 2020

jeremyjordan mentioned this issue Mar 28, 2020

[Discussion] There should be a Metrics package #973

Closed

jeremyjordan mentioned this issue May 29, 2020

[WIP] Add structured result output #1989

Closed

stale bot added the won't fix This will not be worked on label Jun 30, 2020

stale bot closed this as completed Jul 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

better checking of data returned from training_step #1256

better checking of data returned from training_step #1256

jeremyjordan commented Mar 27, 2020

rotalex commented Mar 27, 2020 •

edited

Loading

Borda commented Mar 27, 2020

jeremyjordan commented Mar 28, 2020

Borda commented Mar 28, 2020

williamFalcon commented Mar 29, 2020

williamFalcon commented Mar 29, 2020

jeremyjordan commented Mar 29, 2020

Borda commented Mar 29, 2020

gabisurita commented Apr 30, 2020 •

edited

Loading

williamFalcon commented Apr 30, 2020 •

edited

Loading

tullie commented Apr 30, 2020

stale bot commented Jun 30, 2020

better checking of data returned from training_step #1256

better checking of data returned from training_step #1256

Comments

jeremyjordan commented Mar 27, 2020

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

rotalex commented Mar 27, 2020 • edited Loading

Borda commented Mar 27, 2020

jeremyjordan commented Mar 28, 2020

Borda commented Mar 28, 2020

williamFalcon commented Mar 29, 2020

williamFalcon commented Mar 29, 2020

jeremyjordan commented Mar 29, 2020

Borda commented Mar 29, 2020

gabisurita commented Apr 30, 2020 • edited Loading

williamFalcon commented Apr 30, 2020 • edited Loading

Pro

Con

tullie commented Apr 30, 2020

stale bot commented Jun 30, 2020

rotalex commented Mar 27, 2020 •

edited

Loading

gabisurita commented Apr 30, 2020 •

edited

Loading

williamFalcon commented Apr 30, 2020 •

edited

Loading