-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
This is very cool, but push to even higher gpu usage? #66
Comments
That's a good approach. Did you also increase |
Yes I did. I think the bottleneck may be more in the python code, we're blocking wait on self.model.generate |
Another observation is, using 2 threads in the pool seems better than more |
Are you using |
Yes, this is slow? |
Yes, it's slower than the default transcription mode (see #45). And some operations are indeed running on the CPU in this mode which explains the lower GPU usage. There could be further improvements in the future. |
I see, thanks! Is the cpu part mostly on faster-whisper, or CTranslate2? |
It's probably a contribution of both, but I don't know exactly. Taking the OpenAI implementation as a reference, the following lines are run on CPU in CTranslate2: https://github.com/openai/whisper/blob/v20230314/whisper/timing.py#L208-L214 These steps could benefit from a GPU implementation but I would need some time to come up with an efficient implementation. My first attempt had worse performance than the CPU version! |
Higher GPU usage would probably come from some form of batch execution. This is discussed in #59. |
First, thank you for this awesome work and it indeed improves the transcribe time a lot!
But I'm wondering if it's possible to push to even higher gpu usage so it can be even faster?
From my testing to transcribe a few audios whose length is between 2mins to 15mins, gpu usage is jumping between 70-90%, and occasionally drop to quite low. Tried to instantiate WhisperModel with higher cpu_threads and num_workers, but it doesn't seem to help?
I guess there're some non trivial blocking cpu computation so gpu is not fully utilized. Tried to use a thread pool in python to submit jobs for audios, it has a bit improvement, the peak gpu usage can go higher, but I think on average it didn't increase too much.
Any ideas? Thanks!
The text was updated successfully, but these errors were encountered: