-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
batch execution transcribe in faster-whisper #59
Comments
Hi, The model implemented in CTranslate2 supports batch execution (with some caveats), but faster-whisper currently implements the same transcription logic as openai/whisper which only processes a single audio file. We could add a batch mode in the future. Note that there is already a way to increase throughput for CPU execution: increase |
thanks, i was able to figure it out with threading |
Hey @eschmidbauer, could you elaborate on how you accomplished the threading? |
here is an example, pass the faster whisper model to your Process thread
|
Let's keep this issue open. It could be interesting to have an actual batch execution, especially on GPU. |
+1 here |
mark |
FYI: There is a fork with batch inference for the open ai implementation: openai/whisper#662 |
WhisperX pushed an experimental branch implementing batch execution with faster-whisper: |
An implementation note - it would be great to be able to both segment large audio files (as WhisperX does), and have the option to pass in a bunch of independent audio files and run those as a batch. |
@guillaumekln , The faster-whisper transcribe implementation is still faster than the batch request option proposed by whisperX. |
@guillaumekln Any thoughts on the above? |
How did you find that? The code looks correct to me but you should try simplifying the usage:
See also my comment in the whisperX issue about the current limitations of batch execution in CTranslate2: m-bain/whisperX#159 (comment) |
I timed each function execution with the I'm playing with Otherwise, thanks for the nits/modifications suggested. |
@guillaumekln Great news, I was wrong! After profiling the batching process, it appears that the problem doesn't come from the batch process implementation, but from the Here is a profile of the transcription process using 1 (red) = transcription with It looks like it's blazing fast for the CPU part (the GPU part is not in the profiler). Now, I need to deal with Honestly, this could lead to an implementation directly in faster-whisper if it works. I will keep investigating the process. |
@guillaumekln so i was trying to enable batching support. Here are some points and blockers I feel. Would love to hear your thoughts.
OAI mask 0 -inf -inf ... Length mask in ctranslate2 1 So modifying OAI mask is simple enough for padded tokens by changing values of mask, however for Ctranslate2 masks gets created on fly inside softmax kernel by zeroing out out of range positions. Hence modifying it for padding seems a breaking enough change. Below is the softmax kernel implementation - https://github.com/OpenNMT/CTranslate2/blob/master/src/cpu/kernels.cc
|
Thank you for looking into that. The mask is not the only change to make in the model. When inputs are padded in the left, each example has a different offset when applying the positional embeddings which can no longer be applied with a simple addition: Instead there should be something that tracks the offset of each example and gather different positions of the positional embeddings. Something like This change is a bit more complex, especially if we want to make it compatible with models using different position encoding techniques (rotary embeddings, relative positions, etc.). |
@guillaumekln thanks for responding. Am aware of the position_ids. This can be created on fly via prefix sums on mask.
However I completely agree that introducing these simple changes in CTranslate2 is a bit of effort because things are just so tangled, that the changes need to be at much lower level - ops/ kernel.cc etc. |
Could someone give a little summary of the batching feature? It's useable, but, initial prompts for the submitted batch must be the same? And, word_timestamps cannot be used? Is the timing of the subs otherwise good? I'd like to process long audio files (tv programs, audiobooks, podcasts), currently breaking up to 6 min chunks, staggered with a 1 min overlap, running transcription for the chunks in parallel on faster-whisper instances (seperate python processes with faster-whisper wrapped with FastAPI, regular non-batched 'transcribe') on several gpus, then merging the transcriptions by finding the least offensive 'switch' point in the overlaping sections.. seems to work well. I'd like to try batch processing (to get more throughput by sending multiple chunks to each faster-whisper instance), but don't want to sacrifice the quality of the timings. I don't have a need for word-level timings, this suggests it would be better to leave it off?: EDIT: Guillaume suggests a way in issue #100 Found this in issue #133 Also the issue: "There is the small draw-back, that whisper feeds the transcription of the previous 30s window as prompt to the next window, to get a continuous transcription." https://github.com/m-bain/whisperX/blob/main/whisperx/asr.py If someone has any pointers for what I'm trying to do, I would appreciate it. PS Something I figured out a few days ago: If you chopping up an audiofile to chunks with ffmpeg, put the -ss (start time) argument before the -i (file path), it's much faster, otherwise ffmpeg parses the whole file or something, and gets slower the further in your clip is in the file. You can feed in/out audiodata with pipes rather than files. |
Currently there is no batching mechanism in faster-whisper, just like there is no batching mechanism in openai-whisper. In this issue we discuss the possible ways to integrate batching in faster-whisper. The underlying implementation in CTranslate2 does support batching, but with the limitations discussed above. The main limitation is that it does not support left padding in the input tokens which is mostly required to keep the same transcription logic. This limitation could be addressed at some point. WhisperX chose to not pass the previous tokens and so worked around this limitation. However, the internal methods used to compute the word timestamps already support batch inputs. |
Ignore this, mostly nonsense reasoning based on misunderstanding, see next comment. 🥇 💯
|
@guillaumekln CTranslate2 supports batch execution (with some caveats), but I haven't found relevant usage instructions. Could you provide a more specific tutorial on how to utilize batch execution with CTranslate2? Thank you. |
See the methods documentation in CTranslate2: https://opennmt.net/CTranslate2/python/ctranslate2.models.Whisper.html. Note that all methods take batch inputs. This test case is a possible example on how to build batch inputs for CTranslate2: |
I pushed an experimental branch in CTranslate2 to support variable-length text inputs for the Whisper model: This could allow running the Whisper transcription in batch mode even with It seems that multiple people in this thread tried to implement some form of batching in faster-whisper. It would be great if you can use the experimental branch and see how far you can go with your batch implementation (and share performance numbers!). To install this CTranslate2 development build:
|
Hi @guillaumekln, I'm checking this for batching support. Suppose I understand well the |
Batching over one file is typically not possible with Batching with |
Thanks for the WhisperX link. I already checked this implementation and reproduced it (by simplifying the part with HF transformers pipeline to a PyTorch pipeline). Still, this implementation is prone to problems (hallucinations, some words disappearing, wrong transcription...). I will experiment with audio file batching closer to how the transcription pipeline works. Thanks a lot. |
@guillaumekln Oh, I have been watching the ctranslate2 and faster-whisper commits for this update, but missed these messages. Great. I'll give it a try in the next few days. |
Okay, I got a bit of time free, and a machine to play on. @guillaumekln I followed the special build instructions, looks okay. So, what is needed is something like WhisperModel.transcribe, but rather WhisperModel.transcribe_batch I'll take a shot, are there any points/things to look out for? |
Nevermind, managed to code around it and keep the logic the same.. about 75% done.. |
Hey, sorry I am kind of a noob, but if I understand correctly you are now working on a pull request to implement batching into faster-whisper and the batching will work by utilizing more VRAM? :) |
@hobodrifterdavid any updates on this? Really looking forward to batch transcriptions... |
I didn't get to finish it yet, but here is a modified transcribe.py with added _batch functions: You can diff it against the one in the current repo to see the changes (https://github.com/guillaumekln/faster-whisper/blob/master/faster_whisper/transcribe.py). There's not a lot of work to finish it, but I was hoping @guillaumekln would be able to comment on the approach. Looks like there were a couple of minor changes to transcribe.py from the version I was editing. I saw in the comments that Guillaume is moving on, but hopefully faster-whisper will get it's batch mode. :) |
I took another look over the weekend. I wasn't satisfied with the approach of making duplicate batch functions.. these functions are less readable that the original. Control passes: transcribe() => generate_segments() => generate_with_fallback() => model.generate() Instead of making batch versions of all these functions, it could work better to have a wrapper around transcribe, that passes in a special implemenation of generate_with_fallback (generate_with_fallback_cukoo), that uses async/await to cede control back to the wrapper mid-way through execution, so that a batch function can be called. Something like this (it's a sketch, not working code):
One thing.. I don't think it's possible to do this without making transcribe and generate_segments async. I'm not familar enough with Python async/await to say 100%. But you could make transcribe_async() and generate_segments_async(), and wrap them with simple non-async wrappers transcribe() and generate_segments(). If there are transcribe_async and transcribe etc., they should have the same arguments, it's a bit clumsy to maintain two sets of the long argument lists, proabably a bit better to pass in a class with options as an argument, but then the api changes. |
Hello, I'm wondering if there's any progress on this issue? Wish I could help, but unfortunately I'm not an expert. However, I have reviewed the new repos like the relatively new https://github.com/Vaibhavs10/insanely-fast-whisper and WhisperX's implementation, but was wondering if anything similar is going to be implemented directly in faster-whisper like batching for a single file? I know there's been a several issues created this over a couple repositories... |
mark |
Created this PR last week that integrate batching and additional improvements to Faster Whisper |
Hello,
Is batch execution of faster-whisper's transcribe possible? We've seen in this thread that batch execution should increase the throughput. But it's not clear how to perform batch using faster-whisper if at all. Thanks!
The text was updated successfully, but these errors were encountered: