-
Notifications
You must be signed in to change notification settings - Fork 27k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Seq2Seq Trainer] Make sure padding is implemented for models without pad_token #8043
[Seq2Seq Trainer] Make sure padding is implemented for models without pad_token #8043
Conversation
LGTM! |
@patrickvonplaten , @sshleifer |
Ohoh :-/ can you narrow down the commit that caused the slow-down? I took a look again at https://github.com/huggingface/transformers/pull/7809/files and this line I added could be problematic |
Yeah this line is actually called at every step -> can you check whether removing the |
It's still very slow even after removing that line. I'll try to find the exact commit which is causing this slowdown. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, would prefer stuff moved to init, but don't feel strongly.
if self.config.pad_token_id is None and self.config.eos_token_id is not None: | ||
logger.warn( | ||
f"The `config.pad_token_id` is `None`. Using `config.eos_token_id` = {self.config.eos_token_id} for padding.." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if eos_token_id is None
? Should we raise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be a bit too edge-casy but eos_token_id
could be None
in which case padding would never take place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we raise early in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is that there are models, like openai-gpt
or ctrl
that do not have a eos_token_id
nor do they have a pad_token_id
=> the way it is implemented now these models could still make use of seq2seqTrainer because they would never require padding (because they never finish early). So I'd just leave it as it is - or if you think that models that don't have an EOS token should not use Seq2SeqTrainer we could raise as well - up to you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't understand that they always go to max_length
your implem makes total sense. Thanks for clarifying.
pad_token_id = self.config.pad_token_id if self.config.pad_token_id is not None else self.config.eos_token_id | ||
|
||
if pad_token_id is None: | ||
raise ValueError( | ||
f"Make sure that either `config.pad_token_id` or `config.eos_token_id` is defined if tensor has to be padded to `max_length`={max_length}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check in __init__
for faster failures?
… pad_token (huggingface#8043) * make sure padding is implemented for non-padding tokens models as well * add better error message * add better warning * remove results files * Update examples/seq2seq/seq2seq_trainer.py * remove unnecessary copy line * correct usage of labels * delete test files
… without pad_token (huggingface#8043)" This reverts commit e8fd514.
What does this PR do?
This PR adds padding for models without padding token as well. The logic is the following:
If model predicts
targets
<max_length
=> model has to have at least aneos_token_id
. If model has noconfig.pad_token_id
defined than the model simply uses theconfig.eos_token_id
for padding.If the model has no
config.eos_token_id
, => model cannot generate predictions shorter thanmax_length
. In this case padding will never happen.@sshleifer @patil-suraj - you guys were right -> the
Trainer
requires padding in any case (also if model has no padding token).Could you guys review this PR and see if these fixes in Seq2Seq Trainer are ok for you?