You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
i'm not sure if this is expected, but with medium.en-q5_0, i'm seeing that speaker turns are pretty reliably marked with >>. i'm not using the --diarize or --tdrz flags.
i wasn't seeing this behavior with large-v2, large-v3, or large-v3-q5_0. any thoughts on why that would be happening?
[00:00:00.000 --> 00:00:07.000] SC Okay Houston, we've had a problem here.
[00:00:07.000 --> 00:00:12.000] CAPCOM This is Houston. Say again please.
[00:00:12.000 --> 00:00:15.000] SC Houston, we've had a problem. We've had a main B plus 100 volts.
[00:00:15.000 --> 00:00:20.000] CAPCOM Roger. Main B, 100 volts. Okay, standby 13. We're looking at it.
[00:00:20.000 --> 00:00:29.000] SC Okay. Right now, Houston, the voltage is looking good. And we had a pretty large bang
[00:00:29.000 --> 00:00:36.000] associated with the caution and warning amp. And as I recall, main B was the one that had
[00:00:36.000 --> 00:00:39.000] a amp spike on it once before.
[00:00:39.000 --> 00:00:42.000] CAPCOM Roger, Fred.
[00:00:42.000 --> 00:00:48.000] SC And the interim air, we're starting to go ahead and button up the tunnel again.
I found this:
They may exhibit additional capabilities, particularly if fine-tuned on certain tasks like voice activity detection, speaker classification, or speaker diarization but have not been robustly evaluated in these areas.
... So maybe this is just a weird case, perhaps the medium.en model was trained on that audio sample + a transcript? Wouldn't be too surprising. There are a number of transcripts that use the same speaker identifiers (SC = Spacecraft & CAPCOM = Capsule Communication), e.g. https://nssdc.gsfc.nasa.gov/planetary/lunar/apollo13.pdf
Mostly creating this just to have a placeholder for the topic, as I haven't encountered other discussions. I do recall reading something about how Whisper is trained to suppress this sort of thing... Oh yeah, here we go: openai/whisper#854
The text was updated successfully, but these errors were encountered:
This is probably an upstream "issue", and it's not a problem per se, more just something unexpected.
@khimaros commented on Dec 1, 2023:
I was curious and tried reproducing this, using the
a13.wav
sample obtained viamake samples
from https://upload.wikimedia.org/wikipedia/commons/transcoded/6/6f/Apollo13-wehaveaproblem.ogg/Apollo13-wehaveaproblem.ogg.mp3 (https://commons.wikimedia.org/wiki/File:Apollo13-wehaveaproblem.ogg).No diarization using:
tiny
,tiny.en
,base
,base.en
,small
,small.en
,medium
,large-v1
,large-v2
,large-v2-q5_0
,large-v3-q5_0
.Diarization using:
medium.en
,medium.en-q5_0.bin
.Using the latest from master,
1cf679d
. M1 macOS.I found this:
-- https://github.com/openai/whisper/blob/main/model-card.md#evaluated-use
... So maybe this is just a weird case, perhaps the
medium.en
model was trained on that audio sample + a transcript? Wouldn't be too surprising. There are a number of transcripts that use the same speaker identifiers (SC = Spacecraft & CAPCOM = Capsule Communication), e.g. https://nssdc.gsfc.nasa.gov/planetary/lunar/apollo13.pdfMostly creating this just to have a placeholder for the topic, as I haven't encountered other discussions. I do recall reading something about how Whisper is trained to suppress this sort of thing... Oh yeah, here we go: openai/whisper#854
The text was updated successfully, but these errors were encountered: