Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support generating with fallback for short form audio in Whisper #29508

Closed
Kimahriman opened this issue Mar 7, 2024 · 3 comments
Closed

Support generating with fallback for short form audio in Whisper #29508

Kimahriman opened this issue Mar 7, 2024 · 3 comments
Assignees
Labels

Comments

@Kimahriman
Copy link

Feature request

Generating with temperature fallback based on certain criteria was added to Whisper as part of the long-form generation. We should be able to apply the same fallback criteria to short-form audio. See the discussion here.

Motivation

The upstream OpenAI implementation does fallback for all audio. In fact there is no distinguishing between "short" and "long" audio, everything is essentially treated as "long audio", and if there's only one segment to transcribe, that's all.

See https://github.com/openai/whisper/blob/main/whisper/transcribe.py#L178

Your contribution

I probably cannot address this myself.

@amyeroberts
Copy link
Collaborator

cc @sanchit-gandhi @ylacombe

@ylacombe
Copy link
Contributor

ylacombe commented Apr 1, 2024

cc @sanchit-gandhi, seems that there's a few requests for making long-form audio features compatible with short form audio, do you have time to look into this ?

@sanchit-gandhi
Copy link
Contributor

This is a very valid request and we should indeed refactor generation_whisper.py to make no distinction between short and long-form generation (e.g. as per the original codebase). Would you like to have a go at this @kamilakesbi? Happy to help with reviews and questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants