-
Notifications
You must be signed in to change notification settings - Fork 27.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate: assisted decoding now uses generate
for the assistant
#28030
Conversation
2033a7b
to
132d428
Compare
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@amyeroberts the failing test is also failing in the daily CI (i.e. unrelated to this PR, as it doesn't depend on assisted generation), and I can't reproduce it on my end 🤔 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the refactor and running the slow tests! Looks a lot cleaner ❤️
@amyeroberts I can't merge due to the failing test (which is also failing on |
(the test is flaky 👉 #28035) |
@gante In this case we can merge :) edit: note this was discussed offline as the reason for failing tests was identified and confirmed as independent from this PR |
…ggingface#28030) generate refactor
…ggingface#28030) generate refactor
What does this PR do?
Subset of the original changes in #27979
"Reworks assisted candidate generation to call .generate(), instead of having its own custom generation loop. For most models this is nothing more than a nice abstraction. However, for models with a custom generate() function, this means the assistant model will now make use of it! (🤔 does this mean that DistilWhisper gets better numbers with this refactor?)"
The following tests were run locally and are passing:
RUN_SLOW=1 py.test tests/models/whisper/ -k speculative
py.test tests/ -k test_assisted