Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make whisper-event checkpoints compliant to support return_timestamp #21878

Closed
2 of 4 tasks
Vaibhavs10 opened this issue Mar 1, 2023 · 15 comments · Fixed by #21965
Closed
2 of 4 tasks

Make whisper-event checkpoints compliant to support return_timestamp #21878

Vaibhavs10 opened this issue Mar 1, 2023 · 15 comments · Fixed by #21965

Comments

@Vaibhavs10
Copy link
Member

Vaibhavs10 commented Mar 1, 2023

System Info

  • transformers version: 4.27.0.dev0
  • Platform: Linux-5.10.147+-x86_64-with-glibc2.29
  • Python version: 3.8.10
  • Huggingface_hub version: 0.12.1
  • PyTorch version (GPU?): 1.13.1+cu116 (False)
  • Tensorflow version (GPU?): 2.11.0 (False)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: NO
  • Using distributed or parallel set-up in script?: NO

Who can help?

@sanchit-gandhi @ArthurZucker

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Inferring a Whisper checkpoint fine-tuned before the TimestampstampProcessor was introduced into transformers returns a rather un-informed error message AttributeError: 'GenerationConfig' object has no attribute 'no_timestamps_token_id'

Minimum steps to reproduce this:

from transformers.pipelines import AutomaticSpeechRecognitionPipeline, pipeline
from datasets import load_dataset

cv11 = load_dataset("mozilla-foundation/common_voice_11_0", "hi", split="test", streaming=True)

pipe = pipeline(model="sanchit-gandhi/whisper-small-hi", return_timestamps=True)

test_sample = {"raw": next(iter(cv11))["audio"]["array"],
"sampling_rate": next(iter(cv11))["audio"]["sampling_rate"]}

pipe(test_sample)

Colab/ notebook: here

The above snippet throws an error as mentioned above. This problem effects the majority (727) of the checkpoints fine-tuned during the Whisper Event.

P.S. This has been reported by multiple community members, so not just me.

Expected behavior

We should ideally make the return_timestamp functionality backwards compatible or throw a more informative error message.

Sorry if there already is a way to do this and I am just misinformed.

@ArthurZucker
Copy link
Collaborator

Well if you are using return_timestamps = True you are asking for it 😅
This functionality was introduced after. Let's tell our users that they have to set it in the Generation config (when we pop). Otherwise the generate function should be able to set a default value if multilingual or not

@Vaibhavs10
Copy link
Member Author

Hey hey! - Sorry I did not do a good job at explaining the intent. For a typical developer who doesn't have any clue of how these checkpoints were fine-tuned and just wants to use a checkpoint on the hub for downstream inference only, this poses a challenge.

For them, they'd typically just take a checkpoint throw it into the pipe and expect the pipeline to do its magic - transcribe and provide the timestamps.

So my ask here is the following:

  1. Is there a way to make the checkpoints trained during the Whisper event compliant with the most recent changes?
  2. Can we add a more informative Error message so that an average developer knows what to do next?

IMO point 1 is really important as our library of fine-tuned models is one of the distinguishing factors for us. It'd be less than ideal if we ask the community to have to fine-tune their checkpoints again to be able to get timestamps.

Hope this makes more sense!

@ArthurZucker
Copy link
Collaborator

For 1. I think we can open a PR on all of the whisper models that are from the event to add the required generation config WDYT?
2. This can of course be done on either generate in whisper modelling or in the logits processor!

Makes a lot of sense thanks for reporting! 👍🏻

@bayartsogt-ya
Copy link
Contributor

bayartsogt-ya commented Mar 1, 2023

  1. I think we can open a PR on all of the whisper models that are from the event to add the required generation config WDYT?

Just to be clear, if I add the no_timestamps_token_id to config, it would work with timestamps with re-finetuning?

@ArthurZucker
Copy link
Collaborator

ArthurZucker commented Mar 1, 2023

The model should already be able to produce timestamps without finetuning (as it is knowledge from the pretrained model) but might not be as good as the original pretrained model.
You need more than just no_timestamps_token_id. You have to use the generation_config in full that is available on the openai checkpoints.
This is required as it is a new behaviour

@Vaibhavs10
Copy link
Member Author

Hey @ArthurZucker -> Can you maybe provide the steps one needs to take to make the checkpoints compatible? We can then potentially run autoPR on all the Whisper checkpoints produced during the whisper-event.

@ArthurZucker
Copy link
Collaborator

You can just do something like

from transformers import GenerationConfig, WhisperForConditionalGeneration
model = WhisperForConditionalGeneration.from_pretrained("your_pretrained_checkpoint")
generation_config = GenerationConfig.from_pretrained("openai/whisper-base") # if you are using a multilingual model
model.generation_config = generation_config
model.push_to_hub("your_pretrained_checkpoint", use_auth_token = "your_token_if_not_logged_in", create_pr = True)

@sanchit-gandhi
Copy link
Contributor

Would it not be easier to make changes in the codebase to make it robust to the changes we made to generate (switching to generate config and adding timestamp prediction)? What we have is currently backwards breaking 🚨 and something we want to avoid

@Vaibhavs10
Copy link
Member Author

That makes sense, then I'll refrain from the Auto-PR and wait for these changes to be merged into main. Thank you @sanchit-gandhi & @ArthurZucker <3

@ArthurZucker
Copy link
Collaborator

The main issue is that the generation_config.no_timestamps_token_id is kind of linked to the model (english or not). We are lucky that all the models are multilingual, but we can't default 2 values, and breaking changes it is, but we kind of have to.

@ArthurZucker
Copy link
Collaborator

I will add it to the config of whisper, will be easier to deal with that!

@ArthurZucker
Copy link
Collaborator

Edit: I think opening PR to the relevant repositories will help (easier to generate the generation_config. Also this is not a problem for backward compatibility, as timestamps is a new feature, and is not part of any release yet. However #21937 is indeed a problem and will be fixed by #21965. In the mean time, will also add a warning in case return_timestamps is used when the generation config is not properly setup, that will refer to the solution I shared here!

@ghost
Copy link

ghost commented Jul 19, 2024

please stop "fixing" things

@forfrt
Copy link

forfrt commented Nov 28, 2024

You can just do something like

from transformers import GenerationConfig, WhisperForConditionalGeneration
model = WhisperForConditionalGeneration.from_pretrained("your_pretrained_checkpoint")
generation_config = GenerationConfig.from_pretrained("openai/whisper-base") # if you are using a multilingual model
model.generation_config = generation_config
model.push_to_hub("your_pretrained_checkpoint", use_auth_token = "your_token_if_not_logged_in", create_pr = True)

still cannot generate timestamp with this setting, i also checked PR #21334 . Is return_timestamps supported now? how could i use it properly?

@hassanzadeh
Copy link

I'm also facing the same problem, any ideas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants