-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segment timestamps are buggy in BatchedInferencePipeline #919
Comments
Did some digging around, I think the words timestamp is the problem. In these lines: The last word's end timestamp is used for segments end timestamp. Which points to the error being in the faster-whisper/faster_whisper/transcribe.py Line 1820 in eb83902
This can be confirmed by the fact that if I don't send |
should be fixed in #920 |
Hey @MahmoudAshraf97 Thank you for your quick response and fix! I have been able to recreate it in your code, but for a specific case: colab: https://colab.research.google.com/drive/1ie7uMFW_LJUvxGHW3KkT5iG8uUZiTVwU?usp=sharing In this, it works if I use
However if I pass the same segments into normal transcribe, using segments, info = model.transcribe(arr, word_timestamps=True, clip_timestamps = [0.832, 7.024000000000001], vad_filter=False) That works correctly:
I think its some edge condition :/ |
because the input is not the same |
Oh ok, my bad, I fixed that, and it is working now! |
In the recent code #856,
BatchedInferencePipeline
the segments are sometimes wrong for some reason.I recorded an audio clip: https://drive.google.com/file/d/1cbDbiXi12SIsd0hIDfs61VdtgI78Fg_p/view?usp=sharing
and for this
BatchedInferencePipeline
givesSegment(id=1, seek=2307, start=18.71, end=23.07...
even though the clip is only 3 second long.Here is a colab recreating this issue, just upload batch.wav from the link above: https://colab.research.google.com/drive/1ie7uMFW_LJUvxGHW3KkT5iG8uUZiTVwU?usp=sharing
output:
correct output:
Thank you for all your work!
The text was updated successfully, but these errors were encountered: