Make whisper-event checkpoints compliant to support `return_timestamp` #21878

Vaibhavs10 · 2023-03-01T15:15:23Z

System Info

transformers version: 4.27.0.dev0
Platform: Linux-5.10.147+-x86_64-with-glibc2.29
Python version: 3.8.10
Huggingface_hub version: 0.12.1
PyTorch version (GPU?): 1.13.1+cu116 (False)
Tensorflow version (GPU?): 2.11.0 (False)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: NO
Using distributed or parallel set-up in script?: NO

Who can help?

@sanchit-gandhi @ArthurZucker

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Inferring a Whisper checkpoint fine-tuned before the TimestampstampProcessor was introduced into transformers returns a rather un-informed error message AttributeError: 'GenerationConfig' object has no attribute 'no_timestamps_token_id'

Minimum steps to reproduce this:

from transformers.pipelines import AutomaticSpeechRecognitionPipeline, pipeline
from datasets import load_dataset

cv11 = load_dataset("mozilla-foundation/common_voice_11_0", "hi", split="test", streaming=True)

pipe = pipeline(model="sanchit-gandhi/whisper-small-hi", return_timestamps=True)

test_sample = {"raw": next(iter(cv11))["audio"]["array"],
"sampling_rate": next(iter(cv11))["audio"]["sampling_rate"]}

pipe(test_sample)

Colab/ notebook: here

The above snippet throws an error as mentioned above. This problem effects the majority (727) of the checkpoints fine-tuned during the Whisper Event.

P.S. This has been reported by multiple community members, so not just me.

Expected behavior

We should ideally make the return_timestamp functionality backwards compatible or throw a more informative error message.

Sorry if there already is a way to do this and I am just misinformed.

The text was updated successfully, but these errors were encountered:

ArthurZucker · 2023-03-01T15:23:13Z

Well if you are using return_timestamps = True you are asking for it 😅
This functionality was introduced after. Let's tell our users that they have to set it in the Generation config (when we pop). Otherwise the generate function should be able to set a default value if multilingual or not

Vaibhavs10 · 2023-03-01T15:38:07Z

Hey hey! - Sorry I did not do a good job at explaining the intent. For a typical developer who doesn't have any clue of how these checkpoints were fine-tuned and just wants to use a checkpoint on the hub for downstream inference only, this poses a challenge.

For them, they'd typically just take a checkpoint throw it into the pipe and expect the pipeline to do its magic - transcribe and provide the timestamps.

So my ask here is the following:

Is there a way to make the checkpoints trained during the Whisper event compliant with the most recent changes?
Can we add a more informative Error message so that an average developer knows what to do next?

IMO point 1 is really important as our library of fine-tuned models is one of the distinguishing factors for us. It'd be less than ideal if we ask the community to have to fine-tune their checkpoints again to be able to get timestamps.

Hope this makes more sense!

ArthurZucker · 2023-03-01T16:17:15Z

For 1. I think we can open a PR on all of the whisper models that are from the event to add the required generation config WDYT?
2. This can of course be done on either generate in whisper modelling or in the logits processor!

Makes a lot of sense thanks for reporting! 👍🏻

bayartsogt-ya · 2023-03-01T16:19:43Z

I think we can open a PR on all of the whisper models that are from the event to add the required generation config WDYT?

Just to be clear, if I add the no_timestamps_token_id to config, it would work with timestamps with re-finetuning?

ArthurZucker · 2023-03-01T16:24:10Z

The model should already be able to produce timestamps without finetuning (as it is knowledge from the pretrained model) but might not be as good as the original pretrained model.
You need more than just no_timestamps_token_id. You have to use the generation_config in full that is available on the openai checkpoints.
This is required as it is a new behaviour

Vaibhavs10 · 2023-03-02T13:49:09Z

Hey @ArthurZucker -> Can you maybe provide the steps one needs to take to make the checkpoints compatible? We can then potentially run autoPR on all the Whisper checkpoints produced during the whisper-event.

ArthurZucker · 2023-03-02T13:55:10Z

You can just do something like

from transformers import GenerationConfig, WhisperForConditionalGeneration
model = WhisperForConditionalGeneration.from_pretrained("your_pretrained_checkpoint")
generation_config = GenerationConfig.from_pretrained("openai/whisper-base") # if you are using a multilingual model
model.generation_config = generation_config
model.push_to_hub("your_pretrained_checkpoint", use_auth_token = "your_token_if_not_logged_in", create_pr = True)

sanchit-gandhi · 2023-03-03T16:48:32Z

Would it not be easier to make changes in the codebase to make it robust to the changes we made to generate (switching to generate config and adding timestamp prediction)? What we have is currently backwards breaking 🚨 and something we want to avoid

Vaibhavs10 · 2023-03-03T17:32:18Z

That makes sense, then I'll refrain from the Auto-PR and wait for these changes to be merged into main. Thank you @sanchit-gandhi & @ArthurZucker <3

ArthurZucker · 2023-03-03T17:51:09Z

The main issue is that the generation_config.no_timestamps_token_id is kind of linked to the model (english or not). We are lucky that all the models are multilingual, but we can't default 2 values, and breaking changes it is, but we kind of have to.

ArthurZucker · 2023-03-03T17:51:55Z

I will add it to the config of whisper, will be easier to deal with that!

ArthurZucker · 2023-03-06T13:06:16Z

Edit: I think opening PR to the relevant repositories will help (easier to generate the generation_config. Also this is not a problem for backward compatibility, as timestamps is a new feature, and is not part of any release yet. However #21937 is indeed a problem and will be fixed by #21965. In the mean time, will also add a warning in case return_timestamps is used when the generation config is not properly setup, that will refer to the solution I shared here!

ghost · 2024-07-19T20:38:42Z

please stop "fixing" things

forfrt · 2024-11-28T13:35:46Z

You can just do something like

from transformers import GenerationConfig, WhisperForConditionalGeneration
model = WhisperForConditionalGeneration.from_pretrained("your_pretrained_checkpoint")
generation_config = GenerationConfig.from_pretrained("openai/whisper-base") # if you are using a multilingual model
model.generation_config = generation_config
model.push_to_hub("your_pretrained_checkpoint", use_auth_token = "your_token_if_not_logged_in", create_pr = True)

still cannot generate timestamp with this setting, i also checked PR #21334 . Is return_timestamps supported now? how could i use it properly?

hassanzadeh · 2024-11-29T03:42:45Z

I'm also facing the same problem, any ideas?

ArthurZucker mentioned this issue Mar 6, 2023

[🛠️] Fix-whisper-breaking-changes #21965

Merged

ArthurZucker linked a pull request Mar 6, 2023 that will close this issue

[🛠️] Fix-whisper-breaking-changes #21965

Merged

ArthurZucker closed this as completed in #21965 Mar 14, 2023

sanchit-gandhi mentioned this issue Apr 21, 2023

How to set language in Whisper pipeline for audio transcription? #21809

Closed

4 tasks

sanchit-gandhi mentioned this issue Jul 25, 2023

AttributeError: 'GenerationConfig' object has no attribute 'task_to_id' #25084

Closed

4 tasks

kamilakesbi mentioned this issue May 29, 2024

Support generating with fallback for short form audio in Whisper #30984

Merged

aaqif-elo mentioned this issue Nov 8, 2024

Error While Running transcription = stt.transcribe("audio.wav") shhossain/BanglaSpeech2Text#36

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make whisper-event checkpoints compliant to support `return_timestamp` #21878

Make whisper-event checkpoints compliant to support `return_timestamp` #21878

Vaibhavs10 commented Mar 1, 2023 •

edited

Loading

ArthurZucker commented Mar 1, 2023

Vaibhavs10 commented Mar 1, 2023

ArthurZucker commented Mar 1, 2023

bayartsogt-ya commented Mar 1, 2023 •

edited

Loading

ArthurZucker commented Mar 1, 2023 •

edited

Loading

Vaibhavs10 commented Mar 2, 2023

ArthurZucker commented Mar 2, 2023

sanchit-gandhi commented Mar 3, 2023

Vaibhavs10 commented Mar 3, 2023

ArthurZucker commented Mar 3, 2023

ArthurZucker commented Mar 3, 2023

ArthurZucker commented Mar 6, 2023

ghost commented Jul 19, 2024

forfrt commented Nov 28, 2024

hassanzadeh commented Nov 29, 2024

Make whisper-event checkpoints compliant to support return_timestamp #21878

Make whisper-event checkpoints compliant to support return_timestamp #21878

Comments

Vaibhavs10 commented Mar 1, 2023 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

ArthurZucker commented Mar 1, 2023

Vaibhavs10 commented Mar 1, 2023

ArthurZucker commented Mar 1, 2023

bayartsogt-ya commented Mar 1, 2023 • edited Loading

ArthurZucker commented Mar 1, 2023 • edited Loading

Vaibhavs10 commented Mar 2, 2023

ArthurZucker commented Mar 2, 2023

sanchit-gandhi commented Mar 3, 2023

Vaibhavs10 commented Mar 3, 2023

ArthurZucker commented Mar 3, 2023

ArthurZucker commented Mar 3, 2023

ArthurZucker commented Mar 6, 2023

ghost commented Jul 19, 2024

forfrt commented Nov 28, 2024

hassanzadeh commented Nov 29, 2024

Make whisper-event checkpoints compliant to support `return_timestamp` #21878

Make whisper-event checkpoints compliant to support `return_timestamp` #21878

Vaibhavs10 commented Mar 1, 2023 •

edited

Loading

bayartsogt-ya commented Mar 1, 2023 •

edited

Loading

ArthurZucker commented Mar 1, 2023 •

edited

Loading