-
Notifications
You must be signed in to change notification settings - Fork 27k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WhisperForCausalLM] Add WhisperForCausalLM for speculative decoding #27195
Conversation
The documentation is not available anymore as the PR was closed or merged. |
The failing tests seem to be unrelated:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this!
Just some nits on formatting, docstrings and tests. Would like to have 👍 from @gante before merging to check the changes to generation logic.
# PT-only test: TF doesn't support assisted decoding yet. | ||
# Bart subclass with a kwarg that distorts the output | ||
class FakeBart(BartForConditionalGeneration): | ||
def forward(self, input_ids, foo=False, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use a more descriptive name than foo
here? Or maybe just clarify in the comment that foo
is said kwarg e.g. # Bart subclass with kwarg 'foo' that distorts the output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impressive PR, it's very well contained!
The skeleton and API look good to me, I'm out of my depth for the generation logic.
@@ -945,6 +946,8 @@ class WhisperDecoder(WhisperPreTrainedModel): | |||
config: WhisperConfig | |||
""" | |||
|
|||
main_input_name = "input_ids" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this an oversight in the implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WhisperDecoder
was never tested as stand-alone before. It's also doesn't really make sense to use it alone because on always needs the encoded audio features
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
…/transformers into whisper_decoder_only
Merging as I think Joao is off today and changes in assisted generation are quite minimal IMO. @gante would be great if you could nevertheless take a look once back :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@patrickvonplaten All good on the generation front 👍
Cool strategy for handling the case of a shared encoder, I hope people realize that encoder-decoder LLMs may be viable (and faster) for input-grounded tasks
…uggingface#27195) * finish * add tests * fix all tests * [Assistant Decoding] Add test * fix more * better * finish * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * finish --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
What does this PR do?
This PR enables speculative decoding for all cases where the assistant model is stripped of its encoder weights as they are shared with the teacher model. For now, Distil-Whisper is the main use case here.
In addition a
WhisperForCausalLM
is loaded as it didn't exist yet for Distil-Whisper.The following code should therefore be enabled: