Whisper options in speech-to-text job #1432

edsu · 2024-12-06T19:19:42Z

In the Whisper Pilot we did some testing to try to determine the best options to run Whisper with. The results were summarized in this doc.

We need to ensure that these options are sent to the speech-to-text service, since it does whatever it is told to do, and otherwise uses default values:

model: large (n.b., not large-v3)
condition_on_previous_text: False
no_speech_threshold: 0.6
logprob_threshold: -1.0
compression_ratio_threshold: 2.4
patience: 1.0
beam_size: 5
best_of: 5
word_timestamps: True

These options are currently set in https://github.com/sul-dlss/common-accessioning/blob/main/config/settings.yml#L71-L76

We may want to only specify the options that diverge from the default options. Or we could explicitly set all of them.

The text was updated successfully, but these errors were encountered:

alundgard · 2024-12-07T13:04:52Z

@edsu @peetucket The options that diverge from the default are the following.

    options:
       model: 'large'
       word_timestamps: True
       condition_on_previous_text: False

Our option documentation can be found here.

peetucket · 2024-12-07T16:16:26Z

Yeah, we could just alter the parameters that are different from the default, or just specify as much as we want (even if default) to be sure it stays as is if the defaults change.

edsu · 2024-12-09T16:17:55Z

I'm not sure we want our settings to stay the same if the defaults change. I say this, because the options significance is very often opaque, and we are largely relying on OpenAI to decide what the best settings are. Hopefully we can get to a place where updating the version of whisper can be adequately tested so we aren't taking one step forward and two steps back when it comes to the quality of results, cf. #23

peetucket · 2024-12-09T16:22:35Z

Ok, i'll update so only those settings that deviate from the default are set

peetucket self-assigned this Dec 6, 2024

peetucket mentioned this issue Dec 6, 2024

add specific whisper params #1434

Merged

jmartin-sul closed this as completed in #1434 Dec 9, 2024

This comment has been minimized.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whisper options in speech-to-text job #1432

Whisper options in speech-to-text job #1432

edsu commented Dec 6, 2024 •

edited

Loading

alundgard commented Dec 7, 2024

peetucket commented Dec 7, 2024

edsu commented Dec 9, 2024 •

edited by peetucket

Loading

peetucket commented Dec 9, 2024

This comment has been minimized.

Whisper options in speech-to-text job #1432

Whisper options in speech-to-text job #1432

Comments

edsu commented Dec 6, 2024 • edited Loading

alundgard commented Dec 7, 2024

peetucket commented Dec 7, 2024

edsu commented Dec 9, 2024 • edited by peetucket Loading

peetucket commented Dec 9, 2024

This comment has been minimized.

edsu commented Dec 6, 2024 •

edited

Loading

edsu commented Dec 9, 2024 •

edited by peetucket

Loading