-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add V3 Support #578
Add V3 Support #578
Conversation
@nguyendc-systran , would be great if you could have a look |
@@ -76,6 +77,7 @@ def download_model( | |||
|
|||
allow_patterns = [ | |||
"config.json", | |||
"preprocessor_config.json", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Oscaarjs , hello. We have generated the Systran Whisper large-v3 conversion model with new file preprocessor_config.json in the HF-to-ct2 script, could you please update this info also in README.md file of your PR.
Example: ct2-transformers-converter --model openai/whisper-large-v3 --output_dir whisper-large-v3-ct2
--copy_files tokenizer.json preprocessor_config.json --quantization float16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trungkienbkhn Ah yes! Did the changes, please check if I made them correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. Hopefully there will be some benchmark tests for the faster whisper large-v3 soon.
@Oscaarjs im guessing this still doesn't allow for batch transcribe (which is built-in large-v3)? |
What do you mean by built-in? Afaik there's nothing changed with v3 and v2 that changes that part of the model. Or are you referring to HFs pipeline implementation of it? (which do support batching) |
Your question was off topic in the other PR, it is still off topic in this PR. |
This is a continuation of #548 so credit to its contributors and author @stillmatic.
I've tried to adress some of the comments recieved on that PR.
Potential TODOs: