-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Will it be possible to use the large-v3 model? #544
Comments
Guillaume started a job as Machine Learning Engineer at Apple last month (which he absolutely deserved to get), so I honestly don't think he'll have the time to continue his work on faster-whisper :( |
I tried to do this, but I think this can only be done when OpenAI uploads the model to huggingface, maybe. ( The large-v3 which I couldn't find in huggingface now ) |
The weights are open-source so it should be possible to upload them? https://github.com/openai/whisper/pull/1761/files |
I think this is not only conversion problem. The new large-v3 model uses 128 Mel frequency bins instead of 80 which is hardcoded in faster whisper now. |
Change the |
Could you submit that as a PR? |
i kind of got it working by converting the pt with the openai to hf converter script and then running the ct2 converter on that + the tokenizer.json copied from large-v2
then tried copying over the config files from large-v2 (everything except for the model files) & adjusting as necessary ("num_mel_bins": 128, "vocab_size": 51866). didnt change any of the token ids
also did the second method with large-v2.pt and it works perfectly. just gotta wait for the official hf release but if you really want to get it working now, play around with tokenizer.json and the token ids in config.json |
Thanks! |
Was there any confirmation that OpenAI will upload the model to huggingface? |
can you share the converted v3 model , put it in some net drive , like google drive, and related modified files, so anyone want to use it can just copy it , thanks |
According to this comment, it is converting now |
Alright lets go! |
Hello. I wrote to Guillaume to see if he is willing to accept help to maintain the project. I have an old Guillaume's email address. If somebody has a recent email that works please send it to me jmas@softcatala.org |
this PR should work: #548 |
it doesn't, i just tested it and the provided ct2 conversion is the same as my method 1 above
also alignment doesn't work
just gotta wait for the hf release to do a proper conversion |
hmm, you're right, it returned correct results on very short segments I tested but is nonsense on longer segments. weird, I wonder why this is. |
think its the tokenizer copied from large-v2, depending where they put in the new Cantonese token a lot of the token ids could be offset fwiw turning temperature down to 0 has given me reproducible output across all the conversions i have tried so far, previously it was random, frequently non-english text that made me suspect the language switching but its probably (hopefully) just a side effect of the tokens being off
|
The model is available now at https://huggingface.co/openai/whisper-large-v3 thanks to @sanchit-gandhi !
The huggingface repo does indeed only have |
@thomasmol there is no tokenizer.json, only the tokenizer_config.json. renaming that didn't work but i wrote a quick script to save the tokenizer and copy the files over
and it seems to be working, uploading to hf now |
bababababooey/faster-whisper-large-v3 |
Hey, sorry to jump in at the last minute. What do i have to do to use this now? bababababooey/faster-whisper-large-v3 |
@User1231300 my fork https://github.com/bungerr/faster-whisper-3 should work in the meantime while we work on getting #548 merged
|
thanks a lot for the effort, been waiting for this and will try it later. |
@thomasmol checkout this repo. It has pytorch_model.bin file |
thanks all contribution for whisper-v3! I found some mismatch v2 and v3 at whisper.c in Ctranslate2. So I fixed it. |
can you please give more info how I can do this? |
@circuluspibo want to make a PR to upstream? it feels like that will resolve a lot of issues. (oops, missed that it was in the other PR too!) |
There was already OpenNMT/CTranslate2#1530 fixing that issue (among others). |
Pursuant to the conversation I STARTED HERE, they graciously uploaded the Float32 version, and I believe that the .bin files are up there now. However, they need to be combined before trying to convert, is that correct? Here's an example regarding a different model: Windows
Linux/Mac
Assuming that we have the .bin...as far as converting (either the float32/float16)...the Ctranslate2 repository is working on it right now and I think they're close to a solution if not complete. See HERE. I'm no expert, but maybe wait to see how the converter is ultimately modified in Ctranslate2 since faster-whisper relies on it??? Interested in helping any way I can. Thanks! |
Standalone Faster-Whisper |
Nice, I'll take a look. Does it use the Float32 or Float16, both? |
Models? int8_float32 model by default, if you want float16 model then type |
Cool, I'll check the --help but thanks for the tip. How did you implement the large-v3 so quickly or is it a trade secret? I know the people at the Ctranslate2 github have been working on it, maybe they solved it and you implemented it? I'd like to use the large-v3 in a python script, not CLI, but if you did it a proprietary way I can respect that... |
Cannot find any executables files here. |
Hey thank you very much for this. I noticed it's english only. How can i make it work for other languages too? I need Italian |
Executables are in Releases, it's at the right side of the page. |
#578 has implemented v3 |
No description provided.
The text was updated successfully, but these errors were encountered: