-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need ability to send multiple files in one go #915
Comments
Yes, I agree that the ability to send multiple files at once will be awesome and it's in the TODO list. Basically, we need some bookkeeping in addition. For example: In the above example, the second and thrid entries together is less than 30 sec, but are split across two dictionaries, making sure each are processed separately in parallel. |
Oh great! Thankyou for your reply! |
In this example the second segment is13s, and third is 16s. So if I provide the vad segments, I am guessing VAD will not run? So I can't just combine the audio chunks, I have to send them through VAD then get the audio segments and then send them in, right? My point is that if the second segment is 13s but contains 10s of silence in the end, it can cause Whisper to hallucinate, and since I am manually sending VAD segments, VAD will be skipped in faster-whisper? So my flow should be:
Right? Thank you for all your help! |
If you already provide In this case the vad segments will be: A bit hacky implementation without utilizing GPUs fully. But once we have multiple files as input, should be easier for you. |
Hey @Jiltseb, Thank you for the detailed reply! Also batched_model with batch_size = 1 seems to be a lot more consistent performance than model.transcribe? Why is that? Model.transcribe sometimes spikes to 1s to process 30s on my L40s, while batched_model with batch_size = 1 always takes around 270ms. I am curious, are there other performance improvements in batched_model? |
It looks like #919 is related to There are several reasons for it. |
Hey @Jiltseb if I were to try and open a PR to add ability to send multiple files, how would I go about it? Can you give me a rough guide? |
Have a look at whisper S2T: https://github.com/shashikg/WhisperS2T They provide support for multiple files. |
Since #856 got merged, I was wondering if we can have sending multiple files in one go into faster-whisper, something like:
This would help usecases where you have a lot of small files. I have a use case where I want to transcript multiple files of upto 30s audio (they will never be more than 30s), so I was wondering if I could stitch them together and pass them in as 1 file into
BatchedInferencePipeline
? In my limited tests this seems to work, will the segment always be of exact 30s, basically can I be guaranteed if I pad my audio to be exact 30s, that each segment will be for each audio and no segment will contain any transcription from 2 different audios?Thank you for all your work!
@Jiltseb
The text was updated successfully, but these errors were encountered: