Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix generation doctests #30263

Merged
merged 6 commits into from
Apr 30, 2024
Merged

Conversation

zucchini-nlp
Copy link
Member

What does this PR do?

Fixes some doctests that were failing in this run

@zucchini-nlp zucchini-nlp requested a review from gante April 16, 2024 08:02
| 618 | when | -2.009 | 13.41%
| 356 | we | -1.859 | 15.58%
| 460 | can | -2.508 | 8.14%
| 262 | the | -1.415 | 24.28%
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small numerical precision errors, so I just rewrote those numbers as they are returned

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, that's weird. The CI tests are showing slightly different numerical scores, should I just copy the ones returned by the CI?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'd say stick with the CI values providing they're similar to these. If they're very different, some investigation might be needed, but there's so many small numerical differences that can creep in because of the differences in hardware, setup etc it's probably not worth chasing. We've done the same for some model integration tests

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp
Copy link
Member Author

The failing src/transformers/generation/tf_utils.py::transformers.generation.tf_utils.TFGenerationMixin.sample is passing for me locally 🤔

@ydshieh
Copy link
Collaborator

ydshieh commented Apr 16, 2024

confirmed all passed, thanks a lot!

@ydshieh
Copy link
Collaborator

ydshieh commented Apr 16, 2024

Let's not bother by tests_pr_documentation_tests here. It's likely CPU vs GPU issue.

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on fixing these tests!

Main question is why we're moving the logits_processor. It's completely possible that a use defines their own logits processor which does something similar which wouldn't be properly handled in this case. It seems better to handle the processors vs candidate generator logits compatibility than this forcible removal

Comment on lines 158 to 161
for processor in self.logits_processor:
if type(processor) == MinLengthLogitsProcessor:
self.main_model_min_length = getattr(processor, "min_length")
self.logits_processor.remove(processor)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm - this seems pretty hacky. Why are we removing the logits processor like this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was causing errors because we are overriding the "min_length" and passing it as a kwarg into generate every call. If we do not remove, there will be two min lengths, one as a processor and one as a kwarg.

I am not sure right now how to tackle this another way, I'll give it a thought. I did not think user can rewrite MinLengthLogitsProcessor their own way

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not think user can rewrite MinLengthLogitsProcessor their own way

Why couldn't they define their own logits processor and pass that in? Do we only allow logits processors defined in transformers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you are right, that should be possible. I'll check

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a bit of digging, I think that removing the MinLengthLogitsProcessor is the only solution because we are allowing users to pass in the same thing either as a kwarg or as a "transformers-existing LogitsProcessor". Regardless of way they pass it, the same generation is done, but using "kwargs" is a more straightforward way. Passing LogitsProcessorList can be enabled only for custom processors, raising warnings/errors if transformer-existing processor is passed.

@gante wdyt? this will be a breaking change but might ease maintaining in the long run.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather we just raised an exception if we detect both the kwarg and the logit processor are set. Otherwise we can get unexpected behaviour where the logitprocessor's min_length is ignored.

This raises a more general question about the responsibilities of logit processors v.s. kwargs - it seems there might be future cases when there's overlap.

For what it's worth - I'm not sure how much sense there is in having a min length logits processor? I'd expect the processors to modify the scores, but not necessarily to control the generation logic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth - I'm not sure how much sense there is in having a min length logits processor? I'd expect the processors to modify the scores, but not necessarily to control the generation logic.

Agreed here, but I am not sure it's a stopping criteria either.

This raises a more general question about the responsibilities of logit processors v.s. kwargs - it seems there might be future cases when there's overlap.

Hmm, overlap in a sense that both (kwargs and processors) perform same change in logits? I believe that using logits processors is for advanced users and custom cases, so not sure about possible overlaps

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed here, but I am not sure it's a stopping criteria either.

I think it can be interpreted as a stopping criterion i.e. if it hasn't be satisfied the generation shouldn't stop.

Hmm, overlap in a sense that both (kwargs and processors) perform same change in logits?

Overlap in the sense that they can both have parameters which control the same thing. I'm not sure about changing logits here - is this needed for specifying a min length?

I believe that using logits processors is for advanced users and custom cases, so not sure about possible overlaps

It's more of a question of whether our API allows it - we should be wary of Hyrum's law!

Copy link
Member

@gante gante Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amyeroberts @zucchini-nlp Answering a question above

For what it's worth - I'm not sure how much sense there is in having a min length logits processor? I'd expect the processors to modify the scores, but not necessarily to control the generation logic.

The min length processor does modify the scores, it sets the probability of EOS tokens to 0. Generation always stops when EOS is selected, thus we have to modify the probabilities before selecting the token so as to enforce a minimum length. If we didn't modify the probabilities, then the generation loop would have to become more complex (if eos was selected and we had a minimum length, we would need to go back and pick a different token).

Remember also that there can be multiple stopping criteria in addition to the EOS token (maximum length, wall clock time, ...), and that generation stops when one of them is met -- preventing generation from stopping until min_length was reached would break this behavior 🤗


Regarding the code changes:

MinLengthLogitsProcessor can only be present in self.logits_processor if a user manually passes a MinLengthLogitsProcessor instance to generate (as opposed to the documented min_length or min_new_tokens arguments). By being present as a custom processor, the candidate generation process generates more candidates than it should, hurting performance. Note that the overlap is okay (two minimum length processors is equivalent to only have the largest minimum length), but the custom MinLengthLogitsProcessor is not.

Having any form of logits processing in assisted generation is not a hard requirement, but rather a performance-enhancing operation -- it forces the assistant model to have the same logic processing as the main model. Sadly, Whisper relies on custom logits processors with assisted generation to secure the speedups announced in our Distilwhisper paper, so we can't simply remove support for them to get rid of these (and future related) issues at the cost of a minor throughput hit.

As Amy wrote, exceptions > handling ourselves. I would raise an exception if either MinLengthLogitsProcessor or MinNewTokensLengthLogitsProcessor are set as custom logits processors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Amy wrote, exceptions > handling ourselves. I would raise an exception if either MinLengthLogitsProcessor or MinNewTokensLengthLogitsProcessor are set as custom logits processors.

Yes, raising error was the second option I had, it just seemed a bit un-intuitive from user point to be able to pass LogitsProcessor and then getting an error. Okay, I will raise exception and change the doctests then :)

@zucchini-nlp
Copy link
Member Author

@gante @amyeroberts Now we raise an error is user tries to pass in LogitsProcessor object instead of a kwarg for assistant model. Tests are passing, except for the gpu vs cpu one, as confirmed by @ydshieh above

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing and iterating on this!

I guess we still might hit issues if people add in their own logits processor which wouldn't pass the isinstance check - but there's infinite different ways people can make their own classes, so we can't really prepare for all of them!

Copy link
Member

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for iterating 🤗

@gante
Copy link
Member

gante commented Apr 30, 2024

@ydshieh since the failing doctest is okay on your end, would you be able to merge? 🤗 (I don't have permissions to merge with red CI :D )

@ydshieh
Copy link
Collaborator

ydshieh commented Apr 30, 2024

Yes, I will merge later today. Thank you for all the work/review you 3 have done ❤️

@ydshieh ydshieh self-assigned this Apr 30, 2024
@ydshieh
Copy link
Collaborator

ydshieh commented Apr 30, 2024

The run I triggered is 2 weeks ago and since then there is a commit changing something.

Let's merge as I am kind confident it still works! (well I only triggered doctest)

Thanks again!

@ydshieh ydshieh merged commit b8ac4d0 into huggingface:main Apr 30, 2024
20 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants