[WhisperForCausalLM] Add WhisperForCausalLM for speculative decoding #27195

patrickvonplaten · 2023-10-31T22:06:36Z

What does this PR do?

This PR enables speculative decoding for all cases where the assistant model is stripped of its encoder weights as they are shared with the teacher model. For now, Distil-Whisper is the main use case here.
In addition a WhisperForCausalLM is loaded as it didn't exist yet for Distil-Whisper.

The following code should therefore be enabled:

from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq, AutoModelForCausalLM
from datasets import load_dataset
import torch
import time

# load models and processor
processor = AutoProcessor.from_pretrained("openai/whisper-large-v2")
model = AutoModelForSpeechSeq2Seq.from_pretrained("openai/whisper-large-v2", torch_dtype=torch.float16, low_cpu_mem_usage=True)
model.cuda()
assistant_model = AutoModelForCausalLM.from_pretrained("patrickvonplaten/whisper-large-v2-32-2", torch_dtype=torch.float16, low_cpu_mem_usage=True)
assistant_model.cuda()

print(f"Assistant num params compared to teachear {100 * assistant_model.num_parameters() / model.num_parameters()} %.")

# load audio file
ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
sample = ds[0]["audio"]
input_features = processor(sample["array"], sampling_rate=sample["sampling_rate"], return_tensors="pt").input_features 
input_features = input_features.to(dtype=torch.float16, device="cuda")

# warm-up
_ = model.generate(input_features)

# generate token ids with teacher
start_time = time.time()
predicted_ids = model.generate(input_features)

print("Time normal", time.time() - start_time)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription)
print(20 * "-")

start_time = time.time()
predicted_ids = model.generate(input_features, assistant_model=assistant_model)

print("Time speculative decoding", time.time() - start_time)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription)

HuggingFaceDocBuilderDev · 2023-10-31T23:14:02Z

The documentation is not available anymore as the PR was closed or merged.

patrickvonplaten · 2023-11-01T10:17:12Z

The failing tests seem to be unrelated:

FAILED tests/trainer/test_trainer.py::TrainerIntegrationWithHubTester::test_push_to_hub_with_saves_each_n_steps - Failed: Timeout >120.0s
UNEXPECTED EXCEPTION: ChunkedEncodingError(ProtocolError('Connection broken: IncompleteRead(3430997229 bytes read, 2742372923 more expected)', IncompleteRead(3430997229 bytes read, 2742372923 more expected)))
FAILED tests/models/marian/test_modeling_marian.py::MarianModelTest::test_save_load_keys_to_ignore_on_save - FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpvv597zd7/pytorch_model.bin'
FAILED tests/models/prophetnet/test_modeling_prophetnet.py::ProphetNetModelTest::test_causal_lm_from_pretrained - AssertionError: False is not true
FAILED tests/models/seamless_m4t/test_modeling_seamless_m4t.py::SeamlessM4TGenerationTest::test_speech_generation - Failed: Timeout >120.0s
FAILED tests/models/seamless_m4t/test_modeling_seamless_m4t.py::SeamlessM4TGenerationTest::test_text_generation - AssertionError: Lists differ: [3, 4, 8, 3, 3, 4, 3, 0] != [3, 4, 8, 7, 1, 11, 7, 18, 18, 3, 0, 0, 0, 0, 0, 0,[74 chars]8, 6]

patrickvonplaten · 2023-11-01T10:36:34Z

Not exactly sure what's going on with the docs. They appear just fine with the doc-builder for me:

amyeroberts

Thanks for adding this!

Just some nits on formatting, docstrings and tests. Would like to have 👍 from @gante before merging to check the changes to generation logic.

tests/models/whisper/test_modeling_whisper.py

amyeroberts · 2023-11-01T11:58:25Z

tests/generation/test_utils.py

+        # PT-only test: TF doesn't support assisted decoding yet.
+        # Bart subclass with a kwarg that distorts the output
+        class FakeBart(BartForConditionalGeneration):
+            def forward(self, input_ids, foo=False, **kwargs):


Can we use a more descriptive name than foo here? Or maybe just clarify in the comment that foo is said kwarg e.g. # Bart subclass with kwarg 'foo' that distorts the output

tests/models/whisper/test_modeling_whisper.py

src/transformers/models/whisper/modeling_whisper.py

LysandreJik

Impressive PR, it's very well contained!

The skeleton and API look good to me, I'm out of my depth for the generation logic.

LysandreJik · 2023-11-01T12:36:52Z

src/transformers/models/whisper/modeling_whisper.py

@@ -945,6 +946,8 @@ class WhisperDecoder(WhisperPreTrainedModel):
        config: WhisperConfig
    """

+    main_input_name = "input_ids"


Was this an oversight in the implementation?

WhisperDecoder was never tested as stand-alone before. It's also doesn't really make sense to use it alone because on always needs the encoded audio features

src/transformers/models/whisper/modeling_whisper.py

tests/generation/test_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

…/transformers into whisper_decoder_only

patrickvonplaten · 2023-11-01T15:01:46Z

Merging as I think Joao is off today and changes in assisted generation are quite minimal IMO. @gante would be great if you could nevertheless take a look once back :-)

gante

@patrickvonplaten All good on the generation front 👍

Cool strategy for handling the case of a shared encoder, I hope people realize that encoder-decoder LLMs may be viable (and faster) for input-grounded tasks

…uggingface#27195) * finish * add tests * fix all tests * [Assistant Decoding] Add test * fix more * better * finish * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * finish --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

patrickvonplaten added 2 commits October 31, 2023 23:06

finish

9777ee7

add tests

a44f2b6

patrickvonplaten changed the title ~~finish~~ [WhisperForCausalLM] Add WhisperForCausalLM for speculative decoding Oct 31, 2023

patrickvonplaten added 5 commits November 1, 2023 10:10

fix all tests

8e46715

[Assistant Decoding] Add test

02db599

fix more

b11d8f3

better

22e4aaa

finish

026cf93

patrickvonplaten requested review from gante, amyeroberts and LysandreJik November 1, 2023 10:16

amyeroberts approved these changes Nov 1, 2023

View reviewed changes

LysandreJik approved these changes Nov 1, 2023

View reviewed changes

patrickvonplaten commented Nov 1, 2023

View reviewed changes

src/transformers/models/whisper/modeling_whisper.py Outdated Show resolved Hide resolved

patrickvonplaten commented Nov 1, 2023

View reviewed changes

tests/generation/test_utils.py Outdated Show resolved Hide resolved

patrickvonplaten and others added 3 commits November 1, 2023 13:57

Apply suggestions from code review

83849e0

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

finish

ae8c1a5

Merge branch 'whisper_decoder_only' of https://github.com/huggingface…

951e381

…/transformers into whisper_decoder_only

patrickvonplaten merged commit 391d14e into main Nov 1, 2023
19 of 22 checks passed

patrickvonplaten deleted the whisper_decoder_only branch November 1, 2023 15:01

gante reviewed Nov 2, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WhisperForCausalLM] Add WhisperForCausalLM for speculative decoding #27195

[WhisperForCausalLM] Add WhisperForCausalLM for speculative decoding #27195

patrickvonplaten commented Oct 31, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 31, 2023 •

edited

Loading

patrickvonplaten commented Nov 1, 2023

patrickvonplaten commented Nov 1, 2023 •

edited

Loading

amyeroberts left a comment

amyeroberts Nov 1, 2023

LysandreJik left a comment

LysandreJik Nov 1, 2023

patrickvonplaten Nov 1, 2023

patrickvonplaten commented Nov 1, 2023

gante left a comment •

edited

Loading

[WhisperForCausalLM] Add WhisperForCausalLM for speculative decoding #27195

[WhisperForCausalLM] Add WhisperForCausalLM for speculative decoding #27195

Conversation

patrickvonplaten commented Oct 31, 2023 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Oct 31, 2023 • edited Loading

patrickvonplaten commented Nov 1, 2023

patrickvonplaten commented Nov 1, 2023 • edited Loading

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Nov 1, 2023

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

LysandreJik Nov 1, 2023

Choose a reason for hiding this comment

patrickvonplaten Nov 1, 2023

Choose a reason for hiding this comment

patrickvonplaten commented Nov 1, 2023

gante left a comment • edited Loading

Choose a reason for hiding this comment

patrickvonplaten commented Oct 31, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 31, 2023 •

edited

Loading

patrickvonplaten commented Nov 1, 2023 •

edited

Loading

gante left a comment •

edited

Loading