Better filtering of the model outputs in Trainer #8633

sgugger · 2020-11-18T20:43:42Z

What does this PR do?

As discovered since merging #8530, sometimes (e.g. when using nvidia apex with the O2 optimization) the new model outputs lose their type and become regular dictionaries. This means we can't index into them with integers and some rework in the internals of Trainer has become necessary.

This PR:

fixes the training by indexing in the outputs by string if they are dict, int otherwise when grabbing the loss
fixes the evaluation by indexing in the outputs by string if they are dict, int otherwise when grabbing the loss

but it also takes advantage of the new dict outputs to better filter the outputs at inference. We had several issues recently when using models outputing past states (such as Reformer, XLNet, GPT-2) during evaluation in Trainer. This PR introduces a new API that looks at a possible key in the config of the model to get some attributes to ignore in the ouputs during evaluation (those outputs are then discarded from the predictions returned by the function Trainer.predict or passed along to metric computation in Trainer.evaluate). Since a user might have some use cases where they want to ignore more keys or output those keys, a new argument is added to both Trainer.predict and Trainer.evaluate to fully control the keys ignored in those dictionaries.

If the model outputs tuple, this is all ignored.

Fixes #8523 among others

LysandreJik

This is a very welcome change imo, and the implementation is clean. Thank you for implementing the last test, I think it's great.

patrickvonplaten · 2020-11-28T18:31:05Z

src/transformers/models/marian/configuration_marian.py

@@ -97,3 +97,4 @@ class MarianConfig(BartConfig):
    """

    model_type = "marian"
+    keys_to_ignore_at_inference = ["past_key_values"]


It's a bit late now, but I'm not a huge fan of the name to be honest -> this seems to be very specific to training, but one might think now that past_key_values can never be passed during inference in general. Why not call it keys_to_ignore_at_training?

No this is not for training, only for inference. During training we only get the loss in the outputs.
And this is not ignore to pass to the model, but ignore because they are not part of the logits/scores/predictions we want to gather. Maybe output_keys_to_ignore_at_inference is clearer?

I see! Yeah I think output_keys_to_ignore_at_inference would be a bit clearer to me :-)

Better filtering of the model outputs in Trainer

2aabf05

sgugger requested review from patrickvonplaten and LysandreJik November 18, 2020 20:43

sgugger added 2 commits November 18, 2020 15:47

Fix examples tests

ca16050

Add test for Lysandre

94aed68

LysandreJik approved these changes Nov 19, 2020

View reviewed changes

sgugger merged commit 4208f49 into master Nov 19, 2020

sgugger deleted the trainer_outputs branch November 19, 2020 15:43

patrickvonplaten reviewed Nov 28, 2020

View reviewed changes

sgugger mentioned this pull request Nov 30, 2020

keys_to_ignore_at_inference -> output_keys_to_ignore_at_inference #8857

Closed

aphedges mentioned this pull request Sep 10, 2021

Ignore past_key_values during GPT-Neo inference #13521

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better filtering of the model outputs in Trainer #8633

Better filtering of the model outputs in Trainer #8633

sgugger commented Nov 18, 2020

LysandreJik left a comment

patrickvonplaten Nov 28, 2020

sgugger Nov 30, 2020

patrickvonplaten Nov 30, 2020

sgugger Nov 30, 2020

Better filtering of the model outputs in Trainer #8633

Better filtering of the model outputs in Trainer #8633

Conversation

sgugger commented Nov 18, 2020

What does this PR do?

LysandreJik left a comment

Choose a reason for hiding this comment

patrickvonplaten Nov 28, 2020

Choose a reason for hiding this comment

sgugger Nov 30, 2020

Choose a reason for hiding this comment

patrickvonplaten Nov 30, 2020

Choose a reason for hiding this comment

sgugger Nov 30, 2020

Choose a reason for hiding this comment