Fix generation doctests #30263

zucchini-nlp · 2024-04-16T08:02:26Z

What does this PR do?

Fixes some doctests that were failing in this run

zucchini-nlp · 2024-04-16T08:03:12Z

src/transformers/generation/tf_utils.py

-        |   618 |  when    | -2.009 | 13.41%
-        |   356 |  we      | -1.859 | 15.58%
-        |   460 |  can     | -2.508 | 8.14%
+        |   262 |  the     | -1.415 | 24.28%


small numerical precision errors, so I just rewrote those numbers as they are returned

hmm, that's weird. The CI tests are showing slightly different numerical scores, should I just copy the ones returned by the CI?

Yes, I'd say stick with the CI values providing they're similar to these. If they're very different, some investigation might be needed, but there's so many small numerical differences that can creep in because of the differences in hardware, setup etc it's probably not worth chasing. We've done the same for some model integration tests

HuggingFaceDocBuilderDev · 2024-04-16T08:26:35Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp · 2024-04-16T08:51:47Z

The failing src/transformers/generation/tf_utils.py::transformers.generation.tf_utils.TFGenerationMixin.sample is passing for me locally 🤔

ydshieh · 2024-04-16T19:06:05Z

confirmed all passed, thanks a lot!

ydshieh · 2024-04-16T19:07:41Z

Let's not bother by tests_pr_documentation_tests here. It's likely CPU vs GPU issue.

amyeroberts

Thanks for working on fixing these tests!

Main question is why we're moving the logits_processor. It's completely possible that a use defines their own logits processor which does something similar which wouldn't be properly handled in this case. It seems better to handle the processors vs candidate generator logits compatibility than this forcible removal

amyeroberts · 2024-04-17T11:39:10Z

src/transformers/generation/candidate_generator.py

+        for processor in self.logits_processor:
+            if type(processor) == MinLengthLogitsProcessor:
+                self.main_model_min_length = getattr(processor, "min_length")
+                self.logits_processor.remove(processor)


hmmm - this seems pretty hacky. Why are we removing the logits processor like this?

This was causing errors because we are overriding the "min_length" and passing it as a kwarg into generate every call. If we do not remove, there will be two min lengths, one as a processor and one as a kwarg.

I am not sure right now how to tackle this another way, I'll give it a thought. I did not think user can rewrite MinLengthLogitsProcessor their own way

I did not think user can rewrite MinLengthLogitsProcessor their own way

Why couldn't they define their own logits processor and pass that in? Do we only allow logits processors defined in transformers?

No, you are right, that should be possible. I'll check

After a bit of digging, I think that removing the MinLengthLogitsProcessor is the only solution because we are allowing users to pass in the same thing either as a kwarg or as a "transformers-existing LogitsProcessor". Regardless of way they pass it, the same generation is done, but using "kwargs" is a more straightforward way. Passing LogitsProcessorList can be enabled only for custom processors, raising warnings/errors if transformer-existing processor is passed.

@gante wdyt? this will be a breaking change but might ease maintaining in the long run.

I'd rather we just raised an exception if we detect both the kwarg and the logit processor are set. Otherwise we can get unexpected behaviour where the logitprocessor's min_length is ignored.

This raises a more general question about the responsibilities of logit processors v.s. kwargs - it seems there might be future cases when there's overlap.

For what it's worth - I'm not sure how much sense there is in having a min length logits processor? I'd expect the processors to modify the scores, but not necessarily to control the generation logic.

For what it's worth - I'm not sure how much sense there is in having a min length logits processor? I'd expect the processors to modify the scores, but not necessarily to control the generation logic.

Agreed here, but I am not sure it's a stopping criteria either.

This raises a more general question about the responsibilities of logit processors v.s. kwargs - it seems there might be future cases when there's overlap.

Hmm, overlap in a sense that both (kwargs and processors) perform same change in logits? I believe that using logits processors is for advanced users and custom cases, so not sure about possible overlaps

Agreed here, but I am not sure it's a stopping criteria either.

I think it can be interpreted as a stopping criterion i.e. if it hasn't be satisfied the generation shouldn't stop.

Hmm, overlap in a sense that both (kwargs and processors) perform same change in logits?

Overlap in the sense that they can both have parameters which control the same thing. I'm not sure about changing logits here - is this needed for specifying a min length?

I believe that using logits processors is for advanced users and custom cases, so not sure about possible overlaps

It's more of a question of whether our API allows it - we should be wary of Hyrum's law!

@amyeroberts @zucchini-nlp Answering a question above

For what it's worth - I'm not sure how much sense there is in having a min length logits processor? I'd expect the processors to modify the scores, but not necessarily to control the generation logic.

The min length processor does modify the scores, it sets the probability of EOS tokens to 0. Generation always stops when EOS is selected, thus we have to modify the probabilities before selecting the token so as to enforce a minimum length. If we didn't modify the probabilities, then the generation loop would have to become more complex (if eos was selected and we had a minimum length, we would need to go back and pick a different token).

Remember also that there can be multiple stopping criteria in addition to the EOS token (maximum length, wall clock time, ...), and that generation stops when one of them is met -- preventing generation from stopping until min_length was reached would break this behavior 🤗

Regarding the code changes:

MinLengthLogitsProcessor can only be present in self.logits_processor if a user manually passes a MinLengthLogitsProcessor instance to generate (as opposed to the documented min_length or min_new_tokens arguments). By being present as a custom processor, the candidate generation process generates more candidates than it should, hurting performance. Note that the overlap is okay (two minimum length processors is equivalent to only have the largest minimum length), but the custom MinLengthLogitsProcessor is not.

Having any form of logits processing in assisted generation is not a hard requirement, but rather a performance-enhancing operation -- it forces the assistant model to have the same logic processing as the main model. Sadly, Whisper relies on custom logits processors with assisted generation to secure the speedups announced in our Distilwhisper paper, so we can't simply remove support for them to get rid of these (and future related) issues at the cost of a minor throughput hit.

As Amy wrote, exceptions > handling ourselves. I would raise an exception if either MinLengthLogitsProcessor or MinNewTokensLengthLogitsProcessor are set as custom logits processors.

As Amy wrote, exceptions > handling ourselves. I would raise an exception if either MinLengthLogitsProcessor or MinNewTokensLengthLogitsProcessor are set as custom logits processors.

Yes, raising error was the second option I had, it just seemed a bit un-intuitive from user point to be able to pass LogitsProcessor and then getting an error. Okay, I will raise exception and change the doctests then :)

zucchini-nlp · 2024-04-30T08:05:46Z

@gante @amyeroberts Now we raise an error is user tries to pass in LogitsProcessor object instead of a kwarg for assistant model. Tests are passing, except for the gpu vs cpu one, as confirmed by @ydshieh above

amyeroberts

Thanks for fixing and iterating on this!

I guess we still might hit issues if people add in their own logits processor which wouldn't pass the isinstance check - but there's infinite different ways people can make their own classes, so we can't really prepare for all of them!

gante

Thank you for iterating 🤗

gante · 2024-04-30T10:56:59Z

@ydshieh since the failing doctest is okay on your end, would you be able to merge? 🤗 (I don't have permissions to merge with red CI :D )

ydshieh · 2024-04-30T13:35:05Z

Yes, I will merge later today. Thank you for all the work/review you 3 have done ❤️

ydshieh · 2024-04-30T19:00:54Z

The run I triggered is 2 weeks ago and since then there is a commit changing something.

Let's merge as I am kind confident it still works! (well I only triggered doctest)

Thanks again!

zucchini-nlp added 2 commits April 16, 2024 09:49

fix doctest

18ca6dc

fix torch doctest

672b8a8

zucchini-nlp requested a review from gante April 16, 2024 08:02

zucchini-nlp commented Apr 16, 2024

View reviewed changes

make CI happy

0b80754

amyeroberts reviewed Apr 17, 2024

View reviewed changes

zucchini-nlp added 3 commits April 29, 2024 21:56

raise error

c5b6e6c

Merge branch 'main' into tf_generation_doctests

a29698b

make fixup

ffc24ba

amyeroberts approved these changes Apr 30, 2024

View reviewed changes

gante approved these changes Apr 30, 2024

View reviewed changes

ydshieh self-assigned this Apr 30, 2024

ydshieh merged commit b8ac4d0 into huggingface:main Apr 30, 2024
20 of 22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix generation doctests #30263

Fix generation doctests #30263

zucchini-nlp commented Apr 16, 2024

zucchini-nlp Apr 16, 2024

zucchini-nlp Apr 16, 2024

amyeroberts Apr 16, 2024

HuggingFaceDocBuilderDev commented Apr 16, 2024

zucchini-nlp commented Apr 16, 2024

ydshieh commented Apr 16, 2024

ydshieh commented Apr 16, 2024

amyeroberts left a comment

amyeroberts Apr 17, 2024

zucchini-nlp Apr 17, 2024

amyeroberts Apr 17, 2024

zucchini-nlp Apr 17, 2024

zucchini-nlp Apr 18, 2024

amyeroberts Apr 18, 2024

zucchini-nlp Apr 19, 2024

amyeroberts Apr 19, 2024

gante Apr 29, 2024 •

edited

Loading

zucchini-nlp Apr 29, 2024

zucchini-nlp commented Apr 30, 2024

amyeroberts left a comment

gante left a comment

gante commented Apr 30, 2024

ydshieh commented Apr 30, 2024

ydshieh commented Apr 30, 2024 •

edited

Loading

Fix generation doctests #30263

Fix generation doctests #30263

Conversation

zucchini-nlp commented Apr 16, 2024

What does this PR do?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Apr 16, 2024

zucchini-nlp commented Apr 16, 2024

ydshieh commented Apr 16, 2024

ydshieh commented Apr 16, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gante Apr 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zucchini-nlp commented Apr 30, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

gante left a comment

Choose a reason for hiding this comment

gante commented Apr 30, 2024

ydshieh commented Apr 30, 2024

ydshieh commented Apr 30, 2024 • edited Loading

gante Apr 29, 2024 •

edited

Loading

ydshieh commented Apr 30, 2024 •

edited

Loading