Will it be possible to use the large-v3 model? #544

IllyaPysarchuk · 2023-11-06T18:57:36Z

No description provided.

Arche151 · 2023-11-06T19:58:30Z

Guillaume started a job as Machine Learning Engineer at Apple last month (which he absolutely deserved to get), so I honestly don't think he'll have the time to continue his work on faster-whisper :(

jhj0517 · 2023-11-06T20:39:24Z

I tried to do this, but I think this can only be done when OpenAI uploads the model to huggingface, maybe. ( The large-v3 which I couldn't find in huggingface now )

zamoshchin · 2023-11-06T22:27:19Z

The weights are open-source so it should be possible to upload them?

https://openaipublic.azureedge.net/main/whisper/models/e5b1a55b89c1367dacf97e3e19bfd829a01529dbfdeefa8caeb59b3f1b81dadb/large-v3.pt

https://github.com/openai/whisper/pull/1761/files

alexey-mik · 2023-11-07T01:00:15Z

I think this is not only conversion problem. The new large-v3 model uses 128 Mel frequency bins instead of 80 which is hardcoded in faster whisper now.

ben91lin · 2023-11-07T04:48:59Z

Change the feature_size of the FeatureExtractor from 80 to 128.

vadi2 · 2023-11-07T07:07:27Z

Could you submit that as a PR?

bungerr · 2023-11-07T07:29:12Z

i kind of got it working by converting the pt with the openai to hf converter script and then running the ct2 converter on that + the tokenizer.json copied from large-v2

[23.60s -> 28.60s]  You changed it to Ron when you bought your first Ron's coffee shop six years ago.
[28.60s -> 32.60s]  Now you got 17 of them with eight more coming next quarter.
[32.60s -> 36.75s]  May I help you with something?
[36.75s -> 40.75s]  I like coming here because your Wi-Fi was fast.
[40.75s -> 44.75s]  I mean, you're one of the few spots that has a fiber connection with gigabit speed.
[44.75s -> 46.75s]  It's good.
[47.45s -> 47.97s] ,
[47.97s -> 48.47s] 제를,
[58.93s -> 73.47s]  and I only, the Martinez, bad, I see, that's, is, and you know, it, it's, that, that, the negative, and I know, . . This, and I, inter, I look, William, the model, and I mean, I know throwing the society kettle, and, I, like, descon, the Mess, and, the, or the Rad, the head, I mean, he, and, but, it. So, you'reтом, I, the VI, the sub, I took, and, called the holes, and Iemente, so I, the cow такую, YouTube don't, and, the, the gear, collaborative,業vez. I welcome, the fortunate, and, و, the threekan, the handy, remote,ばい, degree,nem, Frank, de,б, "Con, �,rent, besoin, نہ, MR's,
[76.77s -> 81.03s]  protocol it's not as anonymous as you think it is whoever's in control of the
[81.03s -> 86.67s]  exit nodes is also in control of this traffic which makes me the one in

then tried copying over the config files from large-v2 (everything except for the model files) & adjusting as necessary ("num_mel_bins": 128, "vocab_size": 51866). didnt change any of the token ids

[14.05s -> 49.17s]  Uh, you're Ron, but your real name's Rohit Mehta. You changed it to Ron when you bought your first Ron's Coffee shop six years ago. Now you got 17 of them with eight more coming next quarter. May I help you with something? I like coming here because your Wi-Fi was fast. I mean, you're one of the few spots that has a fiber connection with gigabit speed. It's good. It's so good it scratched that part of my mind.
[49.17s -> 79.15s]  reconnect the data, but it's real. It's all in one thing. Part that doesn't allow good to exist without condition. So I started intercepting all the traffic on your network. That's when I noticed something strange. That's when I decided to hack you. Hack? I know you run a website called Plato's Boys. Pardon me? You're using tor networking to keep the servers anonymous. You made it really hard for anyone to see it. But I saw it. The onion rooting protocol. It's not as anonymous as you think it is.

.....(eventually broke down)

[205.46s -> 235.44s] ,
[215.74s -> 215.88s] G.
[216.30s -> 216.74s] G,
[216.80s -> 221.62s] ,
[221.62s -> 221.66s] ,
[221.66s -> 221.98s] ,

also did the second method with large-v2.pt and it works perfectly. just gotta wait for the official hf release but if you really want to get it working now, play around with tokenizer.json and the token ids in config.json

hoonlight · 2023-11-07T07:38:01Z

i kind of got it working by converting the pt with the openai to hf converter script and then running the ct2 converter on that + the tokenizer.json copied from large-v2

then tried copying over the config files from large-v2 (everything except for the model files) & adjusting as necessary ("num_mel_bins": 128, "vocab_size": 51866). didnt change any of the token ids

also did the second method with large-v2.pt and it works perfectly. just gotta wait for the official hf release but if you really want to get it working now, play around with tokenizer.json and the token ids in config.json

Thanks!

thomasmol · 2023-11-07T10:07:01Z

Was there any confirmation that OpenAI will upload the model to huggingface?

aizimuji · 2023-11-07T10:08:04Z

i kind of got it working by converting the pt with the openai to hf converter script and then running the ct2 converter on that + the tokenizer.json copied from large-v2

[23.60s -> 28.60s]  You changed it to Ron when you bought your first Ron's coffee shop six years ago.
[28.60s -> 32.60s]  Now you got 17 of them with eight more coming next quarter.
[32.60s -> 36.75s]  May I help you with something?
[36.75s -> 40.75s]  I like coming here because your Wi-Fi was fast.
[40.75s -> 44.75s]  I mean, you're one of the few spots that has a fiber connection with gigabit speed.
[44.75s -> 46.75s]  It's good.
[47.45s -> 47.97s] ,
[47.97s -> 48.47s] 제를,
[58.93s -> 73.47s]  and I only, the Martinez, bad, I see, that's, is, and you know, it, it's, that, that, the negative, and I know, . . This, and I, inter, I look, William, the model, and I mean, I know throwing the society kettle, and, I, like, descon, the Mess, and, the, or the Rad, the head, I mean, he, and, but, it. So, you'reтом, I, the VI, the sub, I took, and, called the holes, and Iemente, so I, the cow такую, YouTube don't, and, the, the gear, collaborative,業vez. I welcome, the fortunate, and, و, the threekan, the handy, remote,ばい, degree,nem, Frank, de,б, "Con, �,rent, besoin, نہ, MR's,
[76.77s -> 81.03s]  protocol it's not as anonymous as you think it is whoever's in control of the
[81.03s -> 86.67s]  exit nodes is also in control of this traffic which makes me the one in

then tried copying over the config files from large-v2 (everything except for the model files) & adjusting as necessary ("num_mel_bins": 128, "vocab_size": 51866). didnt change any of the token ids

[14.05s -> 49.17s]  Uh, you're Ron, but your real name's Rohit Mehta. You changed it to Ron when you bought your first Ron's Coffee shop six years ago. Now you got 17 of them with eight more coming next quarter. May I help you with something? I like coming here because your Wi-Fi was fast. I mean, you're one of the few spots that has a fiber connection with gigabit speed. It's good. It's so good it scratched that part of my mind.
[49.17s -> 79.15s]  reconnect the data, but it's real. It's all in one thing. Part that doesn't allow good to exist without condition. So I started intercepting all the traffic on your network. That's when I noticed something strange. That's when I decided to hack you. Hack? I know you run a website called Plato's Boys. Pardon me? You're using tor networking to keep the servers anonymous. You made it really hard for anyone to see it. But I saw it. The onion rooting protocol. It's not as anonymous as you think it is.

.....(eventually broke down)

[205.46s -> 235.44s] ,
[215.74s -> 215.88s] G.
[216.30s -> 216.74s] G,
[216.80s -> 221.62s] ,
[221.62s -> 221.66s] ,
[221.66s -> 221.98s] ,

also did the second method with large-v2.pt and it works perfectly. just gotta wait for the official hf release but if you really want to get it working now, play around with tokenizer.json and the token ids in config.json

can you share the converted v3 model , put it in some net drive , like google drive, and related modified files, so anyone want to use it can just copy it , thanks

sogris · 2023-11-07T10:11:49Z

Was there any confirmation that OpenAI will upload the model to huggingface?

According to this comment, it is converting now
(openai/whisper#1762 (reply in thread))

thomasmol · 2023-11-07T10:13:33Z

Was there any confirmation that OpenAI will upload the model to huggingface?

According to this comment, it is converting now (openai/whisper#1762 (reply in thread))

Alright lets go!

jordimas · 2023-11-07T13:29:12Z

Hello. I wrote to Guillaume to see if he is willing to accept help to maintain the project. I have an old Guillaume's email address. If somebody has a recent email that works please send it to me jmas@softcatala.org

Purfview · 2023-11-07T13:34:15Z

@jordimas Guillaume said ping to nguyendc-systran, there I did, let's see if he shows up.

stillmatic · 2023-11-07T15:03:41Z

this PR should work: #548

bungerr · 2023-11-07T15:52:44Z

it doesn't, i just tested it and the provided ct2 conversion is the same as my method 1 above

[36.75s -> 40.75s]  I like coming here because your Wi-Fi was fast.
[40.75s -> 44.75s]  I mean, you're one of the few spots that has a fiber connection with gigabit speed.
[44.75s -> 46.75s]  It's good.
[46.75s -> 75.81s]  broadcast,, our good,something, plays,95, the, law, cancel, the, the team, the Bet, or, the, don't, the perfect, the peer, return, but, thenego, the ley, the gut, but, the, the, ,, 3, the time, the, ., but, ,, the, ,, ., ., ,, ,D, ,, , The ,, , ,, , ,, .,

also alignment doesn't work

Traceback (most recent call last):
  File "c:\git\faster-whisper-3\test.py", line 22, in <module>
    for segment in segments:
  File "c:\git\faster-whisper-3\faster_whisper\transcribe.py", line 947, in restore_speech_timestamps
    for segment in segments:
  File "c:\git\faster-whisper-3\faster_whisper\transcribe.py", line 563, in generate_segments
    self.add_word_timestamps(
  File "c:\git\faster-whisper-3\faster_whisper\transcribe.py", line 790, in add_word_timestamps
    alignment = self.find_alignment(
  File "c:\git\faster-whisper-3\faster_whisper\transcribe.py", line 900, in find_alignment
    result = self.model.align(
RuntimeError: CUDA failed with error out of memor

just gotta wait for the hf release to do a proper conversion

stillmatic · 2023-11-07T15:58:06Z

hmm, you're right, it returned correct results on very short segments I tested but is nonsense on longer segments. weird, I wonder why this is.

bungerr · 2023-11-07T17:43:57Z

think its the tokenizer copied from large-v2, depending where they put in the new Cantonese token a lot of the token ids could be offset

fwiw turning temperature down to 0 has given me reproducible output across all the conversions i have tried so far, previously it was random, frequently non-english text that made me suspect the language switching but its probably (hopefully) just a side effect of the tokens being off

[1.66s -> 23.60s]  You're Ron, but your real name is Rohit Mehta.
[23.60s -> 28.60s]  You changed it to Ron when you bought your first Ron's coffee shop six years ago.
[28.60s -> 32.60s]  Now you got 17 of them with eight more coming next quarter.
[32.60s -> 36.75s]  May I help you with something?
[36.75s -> 40.75s]  I like coming here because your Wi-Fi was fast.
[40.75s -> 44.75s]  I mean, you're one of the few spots that has a fiber connection with gigabit speed.
[44.75s -> 46.75s]  It's good.
[47.51s -> 48.41s] ,
[49.23s -> 58.65s]  I'm I'm I'm I'm I know, I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm a
[90.66s -> 103.54s] , I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I's I'm I's I'm I's I's I's I'm I's I's I's I's I's I's I's I's I's I's like, I's I's like, I's I's I's I's I's I's I's the
[295.81s -> 305.81s] , I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I,
[305.81s -> 315.83s] ,

thomasmol · 2023-11-07T20:52:52Z

The model is available now at https://huggingface.co/openai/whisper-large-v3 thanks to @sanchit-gandhi !
I tried converting it to CT translate with the following command: ct2-transformers-converter --model openai/whisper-large-v3 --output_dir faster-whisper-large-v3 \ --copy_files tokenizer.json --quantization float16 (same as used for v2)
However, I get this error:

OSError: openai/whisper-large-v3 does not appear to have a file named 
pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.

The huggingface repo does indeed only have model.safetensors for the model weights.
Anybody have a solution for this? Can we convert the .safetensors to .bin?

bungerr · 2023-11-07T21:43:14Z

@thomasmol there is no tokenizer.json, only the tokenizer_config.json. renaming that didn't work but i wrote a quick script to save the tokenizer and copy the files over

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("openai/whisper-large-v3")
tokenizer.save_pretrained("./whisper-large-v3")

and it seems to be working, uploading to hf now

bungerr · 2023-11-07T22:29:00Z

bababababooey/faster-whisper-large-v3

ghost · 2023-11-07T22:36:35Z

Hey, sorry to jump in at the last minute. What do i have to do to use this now? bababababooey/faster-whisper-large-v3

bungerr · 2023-11-08T00:05:00Z

@User1231300 my fork https://github.com/bungerr/faster-whisper-3 should work in the meantime while we work on getting #548 merged

model = WhisperModel(
    'large-v3', device="cuda", compute_type="float16"
)

Ayanaminn · 2023-11-08T01:19:12Z

thanks a lot for the effort, been waiting for this and will try it later.
had some quick tests with the official large-v3 yesterday but the performance was not very satisfactory, with more errors and duplicates when transcribing Japanese.

kurianbenoy-sentient · 2023-11-08T02:35:13Z

@thomasmol checkout this repo. It has pytorch_model.bin file

https://huggingface.co/versae/whisper-large-v3/tree/main

circuluspibo · 2023-11-08T05:42:19Z

thanks all contribution for whisper-v3!

I found some mismatch v2 and v3 at whisper.c in Ctranslate2. So I fixed it.
compile this pacakage if you need to use multiligual mode at faster-whisper 3

https://github.com/circuluspibo/CTranslate2

tema2002 · 2023-11-08T09:32:34Z

thanks all contribution for whisper-v3!

I found some mismatch v2 and v3 at whisper.c in Ctranslate2. So I fixed it. compile this pacakage if you need to use multiligual mode at faster-whisper 3

https://github.com/circuluspibo/CTranslate2

can you please give more info how I can do this?

stillmatic · 2023-11-08T14:51:47Z

@circuluspibo want to make a PR to upstream? it feels like that will resolve a lot of issues. (oops, missed that it was in the other PR too!)

fireattack · 2023-11-08T14:57:39Z

@circuluspibo want to make a PR to upstream? it feels like that will resolve a lot of issues.

There was already OpenNMT/CTranslate2#1530 fixing that issue (among others).

BBC-Esq · 2023-11-08T17:50:14Z

The model is available now at https://huggingface.co/openai/whisper-large-v3 thanks to @sanchit-gandhi ! I tried converting it to CT translate with the following command: ct2-transformers-converter --model openai/whisper-large-v3 --output_dir faster-whisper-large-v3 \ --copy_files tokenizer.json --quantization float16 (same as used for v2) However, I get this error:
OSError: openai/whisper-large-v3 does not appear to have a file named 
pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.
The huggingface repo does indeed only have model.safetensors for the model weights. Anybody have a solution for this? Can we convert the .safetensors to .bin?

Pursuant to the conversation I STARTED HERE, they graciously uploaded the Float32 version, and I believe that the .bin files are up there now. However, they need to be combined before trying to convert, is that correct? Here's an example regarding a different model:

Windows

COPY /B dolphin-2.2-70b.Q8_0.gguf-split-a + dolphin-2.2-70b.Q8_0.gguf-split-b dolphin-2.2-70b.Q8_0.gguf
del dolphin-2.2-70b.Q8_0.gguf-split-a dolphin-2.2-70b.Q8_0.gguf-split-b

Linux/Mac

cat dolphin-2.2-70b.Q8_0.gguf-split-* > dolphin-2.2-70b.Q8_0.gguf && rm dolphin-2.2-70b.Q8_0.gguf-split-*

Assuming that we have the .bin...as far as converting (either the float32/float16)...the Ctranslate2 repository is working on it right now and I think they're close to a solution if not complete. See HERE.

I'm no expert, but maybe wait to see how the converter is ultimately modified in Ctranslate2 since faster-whisper relies on it???

Interested in helping any way I can. Thanks!

Purfview · 2023-11-08T22:49:01Z

Standalone Faster-Whisper r160.3 now supports large-v3, only for Windows atm.

BBC-Esq · 2023-11-09T00:16:37Z

Standalone Faster-Whisper r160.3 now supports large-v3, only for Windows atm.

Nice, I'll take a look. Does it use the Float32 or Float16, both?

Purfview · 2023-11-09T00:21:02Z

Nice, I'll take a look. Does it use the Float32 or Float16, both?

Models? int8_float32 model by default, if you want float16 model then type --model=large-v3-fp16.

BBC-Esq · 2023-11-09T00:25:32Z

Cool, I'll check the --help but thanks for the tip. How did you implement the large-v3 so quickly or is it a trade secret? I know the people at the Ctranslate2 github have been working on it, maybe they solved it and you implemented it? I'd like to use the large-v3 in a python script, not CLI, but if you did it a proprietary way I can respect that...

manjunath7472 · 2023-11-09T06:08:34Z

Standalone Faster-Whisper r160.3 now supports large-v3, only for Windows atm.

Cannot find any executables files here.

ghost · 2023-11-09T12:12:06Z

@User1231300 my fork https://github.com/bungerr/faster-whisper-3 should work in the meantime while we work on getting #548 merged
model = WhisperModel(
    'large-v3', device="cuda", compute_type="float16"
)

Hey thank you very much for this. I noticed it's english only. How can i make it work for other languages too? I need Italian

Purfview · 2023-11-09T12:59:59Z

Standalone Faster-Whisper r160.3 now supports large-v3, only for Windows atm.

Cannot find any executables files here.

Executables are in Releases, it's at the right side of the page.

jnnnnn · 2023-11-30T23:28:56Z

#578 has implemented v3

jhj0517 mentioned this issue Nov 7, 2023

Issue with large-v3 jhj0517/Whisper-WebUI#63

Closed

bungerr mentioned this issue Nov 7, 2023

feat: code for whisper-large-v3 #548

Closed

Arche151 mentioned this issue Nov 8, 2023

[Feature] Support Whisper large-v3 pluja/whishper#48

Closed

thomasmol mentioned this issue Nov 9, 2023

Whisper-v3 thomasmol/cog-whisper-diarization#1

Closed

DougTrajano mentioned this issue Nov 16, 2023

Add support for whisper-large-v3 #565

Closed

MahmoudAshraf97 closed this as completed Nov 26, 2024

Will it be possible to use the large-v3 model? #544

Will it be possible to use the large-v3 model? #544

Comments

IllyaPysarchuk commented Nov 6, 2023

Arche151 commented Nov 6, 2023

jhj0517 commented Nov 6, 2023 • edited Loading

zamoshchin commented Nov 6, 2023 • edited Loading

alexey-mik commented Nov 7, 2023

ben91lin commented Nov 7, 2023

vadi2 commented Nov 7, 2023

bungerr commented Nov 7, 2023 • edited Loading

hoonlight commented Nov 7, 2023 • edited Loading

thomasmol commented Nov 7, 2023

aizimuji commented Nov 7, 2023

sogris commented Nov 7, 2023

thomasmol commented Nov 7, 2023

jordimas commented Nov 7, 2023

Purfview commented Nov 7, 2023

stillmatic commented Nov 7, 2023

bungerr commented Nov 7, 2023 • edited Loading

stillmatic commented Nov 7, 2023

bungerr commented Nov 7, 2023

thomasmol commented Nov 7, 2023

bungerr commented Nov 7, 2023

bungerr commented Nov 7, 2023

ghost commented Nov 7, 2023

bungerr commented Nov 8, 2023 • edited Loading

Ayanaminn commented Nov 8, 2023

kurianbenoy-sentient commented Nov 8, 2023

circuluspibo commented Nov 8, 2023

tema2002 commented Nov 8, 2023

stillmatic commented Nov 8, 2023 • edited Loading

fireattack commented Nov 8, 2023

BBC-Esq commented Nov 8, 2023 • edited Loading

Windows

Linux/Mac

Purfview commented Nov 8, 2023 • edited Loading

BBC-Esq commented Nov 9, 2023

Purfview commented Nov 9, 2023 • edited Loading

BBC-Esq commented Nov 9, 2023

manjunath7472 commented Nov 9, 2023

ghost commented Nov 9, 2023

Purfview commented Nov 9, 2023

jnnnnn commented Nov 30, 2023

jhj0517 commented Nov 6, 2023 •

edited

Loading

zamoshchin commented Nov 6, 2023 •

edited

Loading

bungerr commented Nov 7, 2023 •

edited

Loading

hoonlight commented Nov 7, 2023 •

edited

Loading

bungerr commented Nov 7, 2023 •

edited

Loading

bungerr commented Nov 8, 2023 •

edited

Loading

stillmatic commented Nov 8, 2023 •

edited

Loading

BBC-Esq commented Nov 8, 2023 •

edited

Loading

Purfview commented Nov 8, 2023 •

edited

Loading

Purfview commented Nov 9, 2023 •

edited

Loading