This is very cool, but push to even higher gpu usage? #66

junchen6072 · 2023-03-22T14:06:58Z

First, thank you for this awesome work and it indeed improves the transcribe time a lot!
But I'm wondering if it's possible to push to even higher gpu usage so it can be even faster?
From my testing to transcribe a few audios whose length is between 2mins to 15mins, gpu usage is jumping between 70-90%, and occasionally drop to quite low. Tried to instantiate WhisperModel with higher cpu_threads and num_workers, but it doesn't seem to help?
I guess there're some non trivial blocking cpu computation so gpu is not fully utilized. Tried to use a thread pool in python to submit jobs for audios, it has a bit improvement, the peak gpu usage can go higher, but I think on average it didn't increase too much.

Any ideas? Thanks!

guillaumekln · 2023-03-22T14:17:16Z

Tried to use a thread pool in python to submit jobs for audios,

That's a good approach. Did you also increase num_workers when doing that? Normally this should overlap kernel executions on the GPU and increase the usage.

junchen6072 · 2023-03-22T14:21:18Z

Tried to use a thread pool in python to submit jobs for audios,

That's a good approach. Did you also increase num_workers when doing that? Normally this should overlap kernel executions on the GPU and increase the usage.

Yes I did. I think the bottleneck may be more in the python code, we're blocking wait on self.model.generate

junchen6072 · 2023-03-22T14:24:12Z

Another observation is, using 2 threads in the pool seems better than more

guillaumekln · 2023-03-22T16:57:39Z

Are you using word_timestamps=True?

junchen6072 · 2023-03-22T17:00:10Z

Yes, this is slow?

guillaumekln · 2023-03-22T17:12:13Z

Yes, it's slower than the default transcription mode (see #45).

And some operations are indeed running on the CPU in this mode which explains the lower GPU usage. There could be further improvements in the future.

junchen6072 · 2023-03-22T17:19:00Z

I see, thanks! Is the cpu part mostly on faster-whisper, or CTranslate2?

guillaumekln · 2023-03-22T20:34:30Z

It's probably a contribution of both, but I don't know exactly.

Taking the OpenAI implementation as a reference, the following lines are run on CPU in CTranslate2:

https://github.com/openai/whisper/blob/v20230314/whisper/timing.py#L208-L214

These steps could benefit from a GPU implementation but I would need some time to come up with an efficient implementation. My first attempt had worse performance than the CPU version!

guillaumekln · 2023-09-07T13:31:14Z

Higher GPU usage would probably come from some form of batch execution. This is discussed in #59.

chainyo mentioned this issue Apr 2, 2023

Investigate on a better GPU usage Wordcab/wordcab-transcribe#6

Closed

guillaumekln closed this as completed Sep 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This is very cool, but push to even higher gpu usage? #66

This is very cool, but push to even higher gpu usage? #66

junchen6072 commented Mar 22, 2023

guillaumekln commented Mar 22, 2023

junchen6072 commented Mar 22, 2023

junchen6072 commented Mar 22, 2023

guillaumekln commented Mar 22, 2023

junchen6072 commented Mar 22, 2023

guillaumekln commented Mar 22, 2023

junchen6072 commented Mar 22, 2023

guillaumekln commented Mar 22, 2023

guillaumekln commented Sep 7, 2023

This is very cool, but push to even higher gpu usage? #66

This is very cool, but push to even higher gpu usage? #66

Comments

junchen6072 commented Mar 22, 2023

guillaumekln commented Mar 22, 2023

junchen6072 commented Mar 22, 2023

junchen6072 commented Mar 22, 2023

guillaumekln commented Mar 22, 2023

junchen6072 commented Mar 22, 2023

guillaumekln commented Mar 22, 2023

junchen6072 commented Mar 22, 2023

guillaumekln commented Mar 22, 2023

guillaumekln commented Sep 7, 2023