Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whisper options in speech-to-text job #1432

Closed
edsu opened this issue Dec 6, 2024 · 5 comments · Fixed by #1434
Closed

Whisper options in speech-to-text job #1432

edsu opened this issue Dec 6, 2024 · 5 comments · Fixed by #1434
Assignees

Comments

@edsu
Copy link
Contributor

edsu commented Dec 6, 2024

In the Whisper Pilot we did some testing to try to determine the best options to run Whisper with. The results were summarized in this doc.

We need to ensure that these options are sent to the speech-to-text service, since it does whatever it is told to do, and otherwise uses default values:

model: large (n.b., not large-v3)
condition_on_previous_text: False
no_speech_threshold: 0.6
logprob_threshold: -1.0
compression_ratio_threshold: 2.4
patience: 1.0
beam_size: 5
best_of: 5
word_timestamps: True

These options are currently set in https://github.com/sul-dlss/common-accessioning/blob/main/config/settings.yml#L71-L76

We may want to only specify the options that diverge from the default options. Or we could explicitly set all of them.

@alundgard
Copy link
Member

@edsu @peetucket The options that diverge from the default are the following.

    options:
       model: 'large'
       word_timestamps: True
       condition_on_previous_text: False

Our option documentation can be found here.

@peetucket
Copy link
Member

Yeah, we could just alter the parameters that are different from the default, or just specify as much as we want (even if default) to be sure it stays as is if the defaults change.

@edsu
Copy link
Contributor Author

edsu commented Dec 9, 2024

I'm not sure we want our settings to stay the same if the defaults change. I say this, because the options significance is very often opaque, and we are largely relying on OpenAI to decide what the best settings are. Hopefully we can get to a place where updating the version of whisper can be adequately tested so we aren't taking one step forward and two steps back when it comes to the quality of results, cf. #23

@peetucket
Copy link
Member

Ok, i'll update so only those settings that deviate from the default are set

@DomoMKT

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants