reflect max_new_tokens in `Seq2SeqTrainer` #18786

kumapo · 2022-08-27T17:15:39Z

What does this PR do?

in most cases, VisionEncoderDecoderModel's max_length is set implicitly.
it leads to the problem if the model generates prediction given max_new_tokens.

this PR makes max_new_tokens handled as expected in Seq2SeqTrainer. prediction_step() in the case.

Fixes #18785

P.S. I can reproduce the issue if using huggingface/transformers.
but, using this PR with same codes to reproduce, no exceptions raised.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

trainer: @sgugger

HuggingFaceDocBuilderDev · 2022-08-27T17:28:36Z

The documentation is not available anymore as the PR was closed or merged.

LysandreJik · 2022-08-30T12:26:31Z

@ydshieh, could you take a look at this when you have some time please? Thanks a lot!

ydshieh · 2022-08-30T15:05:24Z

src/transformers/trainer_seq2seq.py

-        )
+        if gen_kwargs.get("max_length") is None and gen_kwargs.get("max_new_tokens") is None:
+            gen_kwargs["max_length"] = self.model.config.max_length
+        prompt_seq_length = 1 if self.model.config.is_encoder_decoder else 0


It seems to me Seq2SeqTrainer will be only used for encoder-decoder models. If this is true, we shouldn't need else 0.

(If a decoder-only model is possible, we will have to get the actual length of the prompt from the inputs, instead of just set it to 0.)

I would like to hear from @patrickvonplaten, @patil-suraj and @sgugger for this.

+1 I think we can be confident that the Seq2Seq trainer only works for models that have self.model.config.is_encoder_decoder = True

ydshieh · 2022-08-30T15:07:34Z

Hi, @kumapo

I believe it also requires a change in

https://github.com/huggingface/transformers/blob/main/src/transformers/trainer_seq2seq.py#L129

right?

kumapo · 2022-08-31T13:37:22Z

@ydshieh, yes. at same time I believe Seq2SeqTrainer.evaluate() needs the same change.

sgugger

Thanks for this PR! Left a comment and then we should be good to merge!

sgugger · 2022-08-31T17:22:51Z

src/transformers/trainer_seq2seq.py

-        )
+        if gen_kwargs.get("max_length") is None and gen_kwargs.get("max_new_tokens") is None:
+            gen_kwargs["max_length"] = self.model.config.max_length
+        prompt_seq_length = 1


It's never modified, so let's use 1 below instead of adding a new variable?

kumapo · 2022-08-31T23:56:46Z

@sgugger, thank you for your feedback. I've updated the PR.

ydshieh

Thank you @kumapo for making the Seq2Seq trainer more robust!

LysandreJik · 2022-09-01T10:06:15Z

It seems there is an issue with your CircleCI permissions, the tests won't run.
Could you try refreshing your permissions as shown here?

sgugger

Good for me barring the tests!

kumapo · 2022-09-01T11:23:25Z

@LysandreJik, I've done all steps to refresh circleci permission.
but it seems that nothing happens with tests. let me know if I missed something to be known.

sgugger · 2022-09-01T11:29:24Z

Can you try pushing an empty commit on your branch to re-trigger the tests?

git commit -m "Trigger CI" --allow-empty

ydshieh · 2022-09-01T12:09:24Z

To pass the test, you can run

make style

and commit the change.

* reflect max_new_tokens in gen_kwargs to `trainer.generate()` * reflect max_new_tokens in `Seq2SeqTrainer` * remove unnecessary variable * Trigger CI * fix style

reflect max_new_tokens in gen_kwargs to trainer.generate()

1d23b96

ydshieh self-assigned this Aug 30, 2022

ydshieh reviewed Aug 30, 2022

View reviewed changes

reflect max_new_tokens in Seq2SeqTrainer

db64e7b

kumapo changed the title ~~reflect max_new_tokens in gen_kwargs to trainer.generate()~~ reflect max_new_tokens in Seq2SeqTrainer Aug 31, 2022

sgugger reviewed Aug 31, 2022

View reviewed changes

remove unnecessary variable

ae3d9b5

ydshieh approved these changes Sep 1, 2022

View reviewed changes

sgugger approved these changes Sep 1, 2022

View reviewed changes

Trigger CI

4b4ac3d

fix style

53d1070

sgugger merged commit ab663b2 into huggingface:main Sep 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reflect max_new_tokens in `Seq2SeqTrainer` #18786

reflect max_new_tokens in `Seq2SeqTrainer` #18786

kumapo commented Aug 27, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Aug 27, 2022 •

edited

Loading

LysandreJik commented Aug 30, 2022

ydshieh Aug 30, 2022

ydshieh Aug 30, 2022

patrickvonplaten Aug 30, 2022

ydshieh commented Aug 30, 2022

kumapo commented Aug 31, 2022

sgugger left a comment

sgugger Aug 31, 2022

kumapo commented Aug 31, 2022

ydshieh left a comment

LysandreJik commented Sep 1, 2022

sgugger left a comment

kumapo commented Sep 1, 2022

sgugger commented Sep 1, 2022

ydshieh commented Sep 1, 2022

reflect max_new_tokens in Seq2SeqTrainer #18786

reflect max_new_tokens in Seq2SeqTrainer #18786

Conversation

kumapo commented Aug 27, 2022 • edited Loading

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Aug 27, 2022 • edited Loading

LysandreJik commented Aug 30, 2022

ydshieh Aug 30, 2022

Choose a reason for hiding this comment

ydshieh Aug 30, 2022

Choose a reason for hiding this comment

patrickvonplaten Aug 30, 2022

Choose a reason for hiding this comment

ydshieh commented Aug 30, 2022

kumapo commented Aug 31, 2022

sgugger left a comment

Choose a reason for hiding this comment

sgugger Aug 31, 2022

Choose a reason for hiding this comment

kumapo commented Aug 31, 2022

ydshieh left a comment

Choose a reason for hiding this comment

LysandreJik commented Sep 1, 2022

sgugger left a comment

Choose a reason for hiding this comment

kumapo commented Sep 1, 2022

sgugger commented Sep 1, 2022

ydshieh commented Sep 1, 2022

reflect max_new_tokens in `Seq2SeqTrainer` #18786

reflect max_new_tokens in `Seq2SeqTrainer` #18786

kumapo commented Aug 27, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Aug 27, 2022 •

edited

Loading