Remove the usage of `transformers.pipeline` from `BatchedInferencePipeline` and fix word timestamps for batched inference #921

MahmoudAshraf97 · 2024-07-21T18:18:59Z

This PR removes Remove the usage of transformers.pipeline from BatchedInferencePipeline because there's no need to use it in the first place
this simplifies the code and removes a requirement
it also fixes #919
it was caused by wrong num_frames argument when finding the alignments, it was assumed that inferring it from encoder output size was sufficient but turned out to cause issues such as #919 when the actual segment size is much less that the inferred size

Jiltseb · 2024-07-22T08:42:49Z

faster_whisper/transcribe.py

@@ -621,10 +509,27 @@ def transcribe(
            all_language_probs=all_language_probs,
        )

+        audio_segments, segments_metadata = self.audio_split(
+            audio, vad_segments, sampling_rate


Can you make the check for shorter audio as the first condition check after if not_vad_segments in line 439? This is to avoid VAD error such as m-bain/whisperX#844 by first checking the condition:

if duration < self.chunk_length: vad_segments = [ {"start": 0.0, "end": duration, "segments": [(0.0, duration)]} ]

I think the problem originates from the VAD itself, doing this will solve it for short audio files, but what about long audio files where this issue occurs in a later window? I'm waiting for the user who reported the problem to upload the test file so I can confirm this

The issue is that VAD is not able to detect it as a voiced region (considering it a noisy region). We can enable vad_offset and vad_onset user configurable, to aid it. The audio file, for example, is: https://litter.catbox.moe/kyu2q8.wav

the file is 404, i've asked the user to reupload it
I'm going to see if silero will detect it correctly, and then decide how to move forward

as I expected, the problem is in the vad model
this is the sequential output
Hey, Cardage, do you know Candace? If you don't know who I am, tell me! BOOM! I GOT DANGER!
this is the batched output using silero vad instead of pyannote
Cardage, do you know Candace?
batched version using pyannote finds no speech in the audio

If you listen carefully to the audio you'll find that only the first sentence makes sense while the rest is very noisy that the model is just hallucinating
this is the ground truth:
Hey Carnage, do you know Candice? YOUR MOTHER HUNG HERSELF, I got that adrenaline momentum

We could address it in detail in the Silero VAD PR. For the time being, we can avoid VAD computation for shorter files with duration < self.chunk_length.

the VAD refactor PR is almost ready, so I suggest we merge this and address the issue in the next PR if it's still a concern

To enhance flexibility, I think we could add a new arg to allow to specify the VAD model (pyannote/silero). This arg will be applicable in both sequential and multi-batch modes.
There have also been some recent feedback that silero v5 is worse than v4 in some languages. So we could also enable to choose the specific version of the VAD model by directly specifying its path.

Jiltseb

I have tested the newer version for speed. Slighly slower on youtube-commons subset but has similar results for an internal benchmarking dataset. Have a look at the comment on VAD issue.

MahmoudAshraf97 · 2024-07-25T18:04:34Z

@trungkienbkhn can I get a final review and merge if no changes needed?

trungkienbkhn · 2024-07-26T08:47:40Z

@MahmoudAshraf97 , LGTM. Tks for your contribution.

…rencePipeline` and fix word timestamps for batched inference (SYSTRAN#921)" This reverts commit d57c5b4.

…eline` and fix word timestamps for batched inference (SYSTRAN#921) * fix word timestamps for batched inference * remove hf pipeline

MahmoudAshraf97 changed the title ~~Remove the usage of transformers.pipeline from BatchedInferencePipeline~~ Remove the usage of transformers.pipeline from BatchedInferencePipeline and fix word timestamps for batched inference Jul 21, 2024

MahmoudAshraf97 mentioned this pull request Jul 21, 2024

fix word timestamps for batched inference #920

Closed

Jiltseb reviewed Jul 22, 2024

View reviewed changes

Jiltseb suggested changes Jul 22, 2024

View reviewed changes

MahmoudAshraf97 added 2 commits July 25, 2024 19:47

fix word timestamps for batched inference

0661caf

remove hf pipeline

16e85e5

MahmoudAshraf97 force-pushed the no_pipeline branch from 3ab66c4 to 16e85e5 Compare July 25, 2024 16:49

Jiltseb mentioned this pull request Jul 26, 2024

New PR for Faster Whisper: Batching Support, Speed Boosts, and Quality Enhancements #856

Merged

remove chunk_length argument from feature extractor

04f8e42

trungkienbkhn merged commit d57c5b4 into SYSTRAN:master Jul 27, 2024
3 checks passed

MahmoudAshraf97 deleted the no_pipeline branch July 27, 2024 16:36

MahmoudAshraf97 mentioned this pull request Jul 29, 2024

Batching inference commit should be reverted and applied part-by-part for community adaptation !!!! #937

Open

MahmoudAshraf97 mentioned this pull request Aug 12, 2024

Revert the batched process updates for the future of the faster-whisper #940

Closed

shinlw added a commit to shinlw/faster-whisper that referenced this pull request Sep 6, 2024

Revert "Remove the usage of transformers.pipeline from `BatchedInfe…

16d8076

…rencePipeline` and fix word timestamps for batched inference (SYSTRAN#921)" This reverts commit d57c5b4.

montvid mentioned this pull request Nov 14, 2024

Timestamps always maximum length when using Silero VAD jhj0517/Whisper-WebUI#287

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove the usage of `transformers.pipeline` from `BatchedInferencePipeline` and fix word timestamps for batched inference #921

Remove the usage of `transformers.pipeline` from `BatchedInferencePipeline` and fix word timestamps for batched inference #921

MahmoudAshraf97 commented Jul 21, 2024 •

edited

Loading

Jiltseb Jul 22, 2024

MahmoudAshraf97 Jul 22, 2024

Jiltseb Jul 22, 2024

MahmoudAshraf97 Jul 22, 2024

MahmoudAshraf97 Jul 22, 2024

Jiltseb Jul 23, 2024

MahmoudAshraf97 Jul 25, 2024

trungkienbkhn Jul 26, 2024

Jiltseb left a comment

MahmoudAshraf97 commented Jul 25, 2024

trungkienbkhn commented Jul 26, 2024

Remove the usage of transformers.pipeline from BatchedInferencePipeline and fix word timestamps for batched inference #921

Remove the usage of transformers.pipeline from BatchedInferencePipeline and fix word timestamps for batched inference #921

Conversation

MahmoudAshraf97 commented Jul 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jiltseb left a comment

Choose a reason for hiding this comment

MahmoudAshraf97 commented Jul 25, 2024

trungkienbkhn commented Jul 26, 2024

Remove the usage of `transformers.pipeline` from `BatchedInferencePipeline` and fix word timestamps for batched inference #921

Remove the usage of `transformers.pipeline` from `BatchedInferencePipeline` and fix word timestamps for batched inference #921

MahmoudAshraf97 commented Jul 21, 2024 •

edited

Loading