Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Seq2Seq Trainer] Make sure padding is implemented for models without pad_token #8043

Merged

Conversation

patrickvonplaten
Copy link
Contributor

@patrickvonplaten patrickvonplaten commented Oct 26, 2020

What does this PR do?

This PR adds padding for models without padding token as well. The logic is the following:

  1. If model predicts targets < max_length => model has to have at least an eos_token_id. If model has no config.pad_token_id defined than the model simply uses the config.eos_token_id for padding.

  2. If the model has no config.eos_token_id, => model cannot generate predictions shorter than max_length. In this case padding will never happen.

@sshleifer @patil-suraj - you guys were right -> the Trainer requires padding in any case (also if model has no padding token).
Could you guys review this PR and see if these fixes in Seq2Seq Trainer are ok for you?

@patil-suraj
Copy link
Contributor

LGTM!

@patil-suraj
Copy link
Contributor

patil-suraj commented Oct 26, 2020

@patrickvonplaten , @sshleifer
I am seeing a major slowdown on TPU V3-8,
last time (9e68d07) sshleifer/student_marian_en_ro_6_3 finished 1 epoch in ~6 mins,
now on this branch it's showing ~1hr 20 mins

@patrickvonplaten
Copy link
Contributor Author

patrickvonplaten commented Oct 26, 2020

@patrickvonplaten , @sshleifer
I am seeing a major slowdown on TPU V3-8,
last time (9e68d07) sshleifer/student_marian_en_ro_6_3 finished 1 epoch in ~6 mins,
now on this branch it's showing ~1hr 20 mins

Ohoh :-/ can you narrow down the commit that caused the slow-down? I took a look again at https://github.com/huggingface/transformers/pull/7809/files and this line I added could be problematic inputs = copy.deepcopy(inputs).

@patrickvonplaten
Copy link
Contributor Author

@patrickvonplaten , @sshleifer
I am seeing a major slowdown on TPU V3-8,
last time (9e68d07) sshleifer/student_marian_en_ro_6_3 finished 1 epoch in ~6 mins,
now on this branch it's showing ~1hr 20 mins

Ohh, can you narrow down the commit that caused the slow-down? I took a look again at https://github.com/huggingface/transformers/pull/7809/files and this line I added could be problematic inputs = copy.deepcopy(inputs).

Yeah this line is actually called at every step -> can you check whether removing the copy operation speeds the seq2seq trainer up again? I've been a bit sloppy there I think :-/

@patil-suraj
Copy link
Contributor

patil-suraj commented Oct 26, 2020

It's still very slow even after removing that line. I'll try to find the exact commit which is causing this slowdown.

Copy link
Contributor

@sshleifer sshleifer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, would prefer stuff moved to init, but don't feel strongly.

if self.config.pad_token_id is None and self.config.eos_token_id is not None:
logger.warn(
f"The `config.pad_token_id` is `None`. Using `config.eos_token_id` = {self.config.eos_token_id} for padding.."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if eos_token_id is None? Should we raise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a bit too edge-casy but eos_token_id could be None in which case padding would never take place

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we raise early in that case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is that there are models, like openai-gpt or ctrl that do not have a eos_token_id nor do they have a pad_token_id => the way it is implemented now these models could still make use of seq2seqTrainer because they would never require padding (because they never finish early). So I'd just leave it as it is - or if you think that models that don't have an EOS token should not use Seq2SeqTrainer we could raise as well - up to you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't understand that they always go to max_length your implem makes total sense. Thanks for clarifying.

examples/seq2seq/seq2seq_trainer.py Outdated Show resolved Hide resolved
Comment on lines +219 to +224
pad_token_id = self.config.pad_token_id if self.config.pad_token_id is not None else self.config.eos_token_id

if pad_token_id is None:
raise ValueError(
f"Make sure that either `config.pad_token_id` or `config.eos_token_id` is defined if tensor has to be padded to `max_length`={max_length}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check in __init__ for faster failures?

@patrickvonplaten patrickvonplaten merged commit 664c7ec into huggingface:master Oct 26, 2020
fabiocapsouza pushed a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020
… pad_token (huggingface#8043)

* make sure padding is implemented for non-padding tokens models as well

* add better error message

* add better warning

* remove results files

* Update examples/seq2seq/seq2seq_trainer.py

* remove unnecessary copy line

* correct usage of labels

* delete test files
fabiocapsouza added a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants