You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Generating with temperature fallback based on certain criteria was added to Whisper as part of the long-form generation. We should be able to apply the same fallback criteria to short-form audio. See the discussion here.
Motivation
The upstream OpenAI implementation does fallback for all audio. In fact there is no distinguishing between "short" and "long" audio, everything is essentially treated as "long audio", and if there's only one segment to transcribe, that's all.
cc @sanchit-gandhi, seems that there's a few requests for making long-form audio features compatible with short form audio, do you have time to look into this ?
This is a very valid request and we should indeed refactor generation_whisper.py to make no distinction between short and long-form generation (e.g. as per the original codebase). Would you like to have a go at this @kamilakesbi? Happy to help with reviews and questions!
Feature request
Generating with temperature fallback based on certain criteria was added to Whisper as part of the long-form generation. We should be able to apply the same fallback criteria to short-form audio. See the discussion here.
Motivation
The upstream OpenAI implementation does fallback for all audio. In fact there is no distinguishing between "short" and "long" audio, everything is essentially treated as "long audio", and if there's only one segment to transcribe, that's all.
See https://github.com/openai/whisper/blob/main/whisper/transcribe.py#L178
Your contribution
I probably cannot address this myself.
The text was updated successfully, but these errors were encountered: