[Seq2Seq Trainer] Make sure padding is implemented for models without pad_token #8043

patrickvonplaten · 2020-10-26T10:48:03Z

What does this PR do?

This PR adds padding for models without padding token as well. The logic is the following:

If model predicts targets < max_length => model has to have at least an eos_token_id. If model has no config.pad_token_id defined than the model simply uses the config.eos_token_id for padding.
If the model has no config.eos_token_id, => model cannot generate predictions shorter than max_length. In this case padding will never happen.

@sshleifer @patil-suraj - you guys were right -> the Trainer requires padding in any case (also if model has no padding token).
Could you guys review this PR and see if these fixes in Seq2Seq Trainer are ok for you?

patil-suraj · 2020-10-26T11:51:29Z

LGTM!

examples/seq2seq/seq2seq_trainer.py

patil-suraj · 2020-10-26T12:05:19Z

@patrickvonplaten , @sshleifer
I am seeing a major slowdown on TPU V3-8,
last time (9e68d07) sshleifer/student_marian_en_ro_6_3 finished 1 epoch in ~6 mins,
now on this branch it's showing ~1hr 20 mins

patrickvonplaten · 2020-10-26T12:38:46Z

@patrickvonplaten , @sshleifer
I am seeing a major slowdown on TPU V3-8,
last time (9e68d07) sshleifer/student_marian_en_ro_6_3 finished 1 epoch in ~6 mins,
now on this branch it's showing ~1hr 20 mins

Ohoh :-/ can you narrow down the commit that caused the slow-down? I took a look again at https://github.com/huggingface/transformers/pull/7809/files and this line I added could be problematic inputs = copy.deepcopy(inputs).

patrickvonplaten · 2020-10-26T12:42:06Z

@patrickvonplaten , @sshleifer
I am seeing a major slowdown on TPU V3-8,
last time (9e68d07) sshleifer/student_marian_en_ro_6_3 finished 1 epoch in ~6 mins,
now on this branch it's showing ~1hr 20 mins

Ohh, can you narrow down the commit that caused the slow-down? I took a look again at https://github.com/huggingface/transformers/pull/7809/files and this line I added could be problematic inputs = copy.deepcopy(inputs).

Yeah this line is actually called at every step -> can you check whether removing the copy operation speeds the seq2seq trainer up again? I've been a bit sloppy there I think :-/

patil-suraj · 2020-10-26T13:38:54Z

It's still very slow even after removing that line. I'll try to find the exact commit which is causing this slowdown.

…aten/transformers into fix_seq2seq_trainer

sshleifer

LGTM, would prefer stuff moved to init, but don't feel strongly.

sshleifer · 2020-10-26T15:23:05Z

examples/seq2seq/seq2seq_trainer.py

+        if self.config.pad_token_id is None and self.config.eos_token_id is not None:
+            logger.warn(
+                f"The `config.pad_token_id` is `None`. Using `config.eos_token_id` = {self.config.eos_token_id} for padding.."
+            )


what if eos_token_id is None? Should we raise?

Might be a bit too edge-casy but eos_token_id could be None in which case padding would never take place

should we raise early in that case?

What I meant is that there are models, like openai-gpt or ctrl that do not have a eos_token_id nor do they have a pad_token_id => the way it is implemented now these models could still make use of seq2seqTrainer because they would never require padding (because they never finish early). So I'd just leave it as it is - or if you think that models that don't have an EOS token should not use Seq2SeqTrainer we could raise as well - up to you!

Didn't understand that they always go to max_length your implem makes total sense. Thanks for clarifying.

examples/seq2seq/seq2seq_trainer.py

sshleifer · 2020-10-26T15:26:05Z

examples/seq2seq/seq2seq_trainer.py

+        pad_token_id = self.config.pad_token_id if self.config.pad_token_id is not None else self.config.eos_token_id
+
+        if pad_token_id is None:
+            raise ValueError(
+                f"Make sure that either `config.pad_token_id` or `config.eos_token_id` is defined if tensor has to be padded to `max_length`={max_length}"
+            )


Should we check in __init__ for faster failures?

patrickvonplaten added 3 commits October 26, 2020 10:29

make sure padding is implemented for non-padding tokens models as well

f8f49d7

add better error message

c5b6ab0

add better warning

bd4a2fd

patrickvonplaten requested a review from sshleifer October 26, 2020 10:49

remove results files

8667b9c

patrickvonplaten commented Oct 26, 2020

View reviewed changes

examples/seq2seq/seq2seq_trainer.py Outdated Show resolved Hide resolved

Update examples/seq2seq/seq2seq_trainer.py

d305d66

patrickvonplaten added 3 commits October 26, 2020 14:14

remove unnecessary copy line

04c533a

Merge branch 'fix_seq2seq_trainer' of https://github.com/patrickvonpl…

1f8c26f

…aten/transformers into fix_seq2seq_trainer

correct usage of labels

c1bde00

sshleifer reviewed Oct 26, 2020

View reviewed changes

delete test files

ade3d54

patrickvonplaten merged commit 664c7ec into huggingface:master Oct 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Seq2Seq Trainer] Make sure padding is implemented for models without pad_token #8043

[Seq2Seq Trainer] Make sure padding is implemented for models without pad_token #8043

patrickvonplaten commented Oct 26, 2020 •

edited

Loading

patil-suraj commented Oct 26, 2020

patil-suraj commented Oct 26, 2020 •

edited

Loading

patrickvonplaten commented Oct 26, 2020 •

edited

Loading

patrickvonplaten commented Oct 26, 2020

patil-suraj commented Oct 26, 2020 •

edited

Loading

sshleifer left a comment

sshleifer Oct 26, 2020

patrickvonplaten Oct 26, 2020

sshleifer Oct 26, 2020

patrickvonplaten Oct 26, 2020

sshleifer Oct 26, 2020

sshleifer Oct 26, 2020

[Seq2Seq Trainer] Make sure padding is implemented for models without pad_token #8043

[Seq2Seq Trainer] Make sure padding is implemented for models without pad_token #8043

Conversation

patrickvonplaten commented Oct 26, 2020 • edited Loading

What does this PR do?

patil-suraj commented Oct 26, 2020

patil-suraj commented Oct 26, 2020 • edited Loading

patrickvonplaten commented Oct 26, 2020 • edited Loading

patrickvonplaten commented Oct 26, 2020

patil-suraj commented Oct 26, 2020 • edited Loading

sshleifer left a comment

Choose a reason for hiding this comment

sshleifer Oct 26, 2020

Choose a reason for hiding this comment

patrickvonplaten Oct 26, 2020

Choose a reason for hiding this comment

sshleifer Oct 26, 2020

Choose a reason for hiding this comment

patrickvonplaten Oct 26, 2020

Choose a reason for hiding this comment

sshleifer Oct 26, 2020

Choose a reason for hiding this comment

sshleifer Oct 26, 2020

Choose a reason for hiding this comment

patrickvonplaten commented Oct 26, 2020 •

edited

Loading

patil-suraj commented Oct 26, 2020 •

edited

Loading

patrickvonplaten commented Oct 26, 2020 •

edited

Loading

patil-suraj commented Oct 26, 2020 •

edited

Loading