-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Steps to transcribe in French #1
Comments
Hi you can try: This wav2vec2.0 model finetuned on french. Let me know how it goes -- I can start setting default models per language |
Thanks for you reply. I used |
Ah sorry there was a bug in the argument parser, git pull and try again now :) |
Ok thanks for the update! Yet I could test further because it then failed during "Alignment" because I think it has something to do with the apostroph Consequently so far I cannot tell you if it brings improvements in French over stock Whisper:wink:! |
Hi, I can let a hand with Spanish. I started yesterday with normalization and spanish.py. I will continue during next week. |
@Ca-ressemble-a-du-fake thanks for pointing this out, I see the culprit was this line
Which hard-coded the cleaning of the whisper transcript to only contain a-z and apostrophes (which is in the english wav2vec2.0 dictionary). @RaulKite keep me updated on spanish! |
Thanks for the update! Why is in converting the text to upper case ? Because of that it does shows a KeyError with the third character ("E"). If I print the model_dictionary it shows lower case letters. So lower casing the characters from Then in utils while writing the transcript to disk in ass format, With these quick and dirty workarounds the output even with large whisper model looks far more accurate! |
Thank for this, I have pushed a big fix and it seems working for french now (with varying degrees of accuracy). Please see examples in the README. |
Thanks for this fix! It works good now! Punctuation is missing but I am not sure this is specific to French, is it ? |
@Ca-ressemble-a-du-fake I think this is a whisper issue. Some examples it does output punctuation: https://github.com/m-bain/whisperX#french From my experience it seems non-english Whisper tends to transcrbine without punctuation sometimes, sorry |
It usually is fixed giving and initial prompt containing punctuation signs. For example: "and now, we continue with next video." |
@m-bain I only tried with medium and the examples use the large model (which shows much worse timestamps). By the way whisper alone does output punctuation (all models). Furthermore I noticed that starting timestamps are often in the middle of the starting words. Is there a way that it starts earlier ? Moreover would you mind explaining very briefly what the wav2vec2 model is used for ? I mean what your agorithm does (I can read it but I still don't understand it)? Thanks @RaulKite I'll try with initial-prompt also! |
@Ca-ressemble-a-du-fake yes I know whisper outputs punctuation, but sometimes the model gets stuck in like a "non-punctuation" mode, but this is a whisper error not an error with the alignment algorithm. You can see this by printing the whisper output before alignment here: whisperX/whisperx/transcribe.py Line 433 in cbaeb85
One thing that might be causing a discrepancy is this uses whisper with --condition_on_previous_text False otherwise whisper timestamps can be off by 5 seconds or something crazy like that.
Anyway, Raul's solution should help though (you might want to write that prompt in french). Re: starting timestamps being in the middle of the word: Re: the algorithm: E.g. Then we just look for the most likely monotonously increasing path of the following letters only See for more |
Awesome! Thanks a ton @m-bain I can look at your code again and learn how you did that:smile: ! |
adds link to whisperX medium on replicate and updates replicate bades…
* Smaller dockerfile * Small dockerfile no model * README no need for -- with new docker * Revert "README no need for -- with new docker" This reverts commit e9816fb2df019148e148e656381b44f1f044446d. --------- Co-authored-by: 陳鈞 <jim60105@gmail.com>
Hi,
Thanks for sharing this work. You wrote that it still needs testing... can I test it in French 😉?
I am not sure what I should change. I saw that the wav2vec2 model could be passed in as parameter (see the readme), but in code there are some harcoded pipelines refering to the the english model. For French there is a wav2vec2 model for French (never tested since I was relying on Whisper only).
Looking forward to testing this in French!
The text was updated successfully, but these errors were encountered: