Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve --model argument handling and help message #1764

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

spartanhaden
Copy link

This PR introduces the following updates to the whisper/transcribe.py script:

  • Enhancement of the --model argument handling and help message: The --model argument now provides a list of available model choices along with the default option when the --help flag is used. This enhances user experience by providing immediate visibility of the available options.

    • Previous message: --model MODEL name of the Whisper model to use (default: small)
    • Updated message: --model MODEL name of the Whisper model to use. Available models are: tiny.en, tiny, base.en, base, small.en, small, medium.en, medium, large-v1, large-v2, large-v3, large. You can also specify a path to a model checkpoint. (default: small)
    • Note: The choices=available_models() option was not used to allow the use of custom model checkpoints.
  • Improved error message for incorrect model names: If a non-existing model name is used, the error message now functions as intended and indicates the error and provides the list of valid model names.

    • Previous message: whisper: error: argument --model: invalid valid_model_name value: 'some_incorrect_model_name'
    • Updated message: whisper: error: argument --model: model should be one of ['tiny.en', 'tiny', 'base.en', 'base', 'small.en', 'small', 'medium.en', 'medium', 'large-v1', 'large-v2', 'large-v3', 'large'] or path to a model checkpoint

@MohamedAliRashad
Copy link

large-v3 is not working, it's giving me this error.

RuntimeError: Given groups=1, weight of size [1280, 128, 3], expected input[1, 80, 3000] to have 128 channels, but got 80 channels instead

@FurkanGozukara
Copy link

large-v3 is not working, it's giving me this error.

RuntimeError: Given groups=1, weight of size [1280, 128, 3], expected input[1, 80, 3000] to have 128 channels, but got 80 channels instead

i tested yesterday worked very well

i am using python 3.10.11

@MohamedAliRashad
Copy link

@FurkanGozukara
It turned out that the function log_mel_spectrogram requires n_mels to be set to 128 because large-v3 works with 128 not 80 like large-v2.

@FurkanGozukara
Copy link

log_mel_spectrogram

what does it do?

@ihmily
Copy link

ihmily commented Nov 18, 2023

log_mel_spectrogram

what does it do?

I also encountered this issue, and the error message is Given groups=1, weight of size [1280, 128, 3], expected input[1, 80, 3000] to have 128 channels, but got 80 channels instead.

The solution is to modify the following line of code:

mel = whisper.log_mel_spectrogram(audio).to(model.device)

to:

mel = whisper.log_mel_spectrogram(audio, n_mels=128).to(model.device)

By explicitly setting n_mels=128, it might resolve the issue and allow the code to run properly. If it still doesn't work, you can try changing n_mels = 128 back to n_mels = 80.

In general, when higher frequency resolution is needed, selecting n_mels = 128 is recommended. A higher value of n_mels provides more Mel frequency filters, capturing more details and frequency components in the spectrogram representation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants