-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix single letter stop strings #31448
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@Rocketknight1 Could you explain this a bit more? Why does this result in no stop strings having any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
@amyeroberts Sorry for the confusion! The way this works is that we precompute two things for every The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for quickly addressing!
Code changes look good, but I cannot test due to related bug:
E ValueError: There are one or more stop strings, either in the arguments to `generate` or in the model's generation config, but we could not locate a tokenizer. When generating with stop strings, you must pass the model's tokenizer to the `tokenizer` argument of `generate`.
Can we get rid of the second tokenizer = kwargs.pop("tokenizer", None)
as well in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job @Rocketknight1 LGTM
Thanks so much for the tokenizer popping fix @gante
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing!
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
fe4bcc9
to
2c27f3d
Compare
* Fix single letter stop strings * Change the 0 to a 1 to avoid potential empty vector headaches later * Restructure for clarity * Update tests/generation/test_stopping_criteria.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Add the unsqueeze --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Fix single letter stop strings * Change the 0 to a 1 to avoid potential empty vector headaches later * Restructure for clarity * Update tests/generation/test_stopping_criteria.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Add the unsqueeze --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Our
StopStringCriteria
has a bug when all stop strings are one letter long. This results in a case where no stop strings have anyvalid_positions
, as this variable only tracks internal positions, and this causes amax()
to fail later.This code fixes the issue by ensuring that
max()
takes a minimum value of1
. It also adds a test for single-letter stop strings!Fixes #31435
cc @amyeroberts @zucchini-nlp @gante