Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

This is very cool, but push to even higher gpu usage? #66

Closed
junchen6072 opened this issue Mar 22, 2023 · 9 comments
Closed

This is very cool, but push to even higher gpu usage? #66

junchen6072 opened this issue Mar 22, 2023 · 9 comments

Comments

@junchen6072
Copy link

First, thank you for this awesome work and it indeed improves the transcribe time a lot!
But I'm wondering if it's possible to push to even higher gpu usage so it can be even faster?
From my testing to transcribe a few audios whose length is between 2mins to 15mins, gpu usage is jumping between 70-90%, and occasionally drop to quite low. Tried to instantiate WhisperModel with higher cpu_threads and num_workers, but it doesn't seem to help?
I guess there're some non trivial blocking cpu computation so gpu is not fully utilized. Tried to use a thread pool in python to submit jobs for audios, it has a bit improvement, the peak gpu usage can go higher, but I think on average it didn't increase too much.

Any ideas? Thanks!

@guillaumekln
Copy link
Contributor

Tried to use a thread pool in python to submit jobs for audios,

That's a good approach. Did you also increase num_workers when doing that? Normally this should overlap kernel executions on the GPU and increase the usage.

@junchen6072
Copy link
Author

Tried to use a thread pool in python to submit jobs for audios,

That's a good approach. Did you also increase num_workers when doing that? Normally this should overlap kernel executions on the GPU and increase the usage.

Yes I did. I think the bottleneck may be more in the python code, we're blocking wait on self.model.generate

@junchen6072
Copy link
Author

Another observation is, using 2 threads in the pool seems better than more

@guillaumekln
Copy link
Contributor

Are you using word_timestamps=True?

@junchen6072
Copy link
Author

Yes, this is slow?

@guillaumekln
Copy link
Contributor

Yes, it's slower than the default transcription mode (see #45).

And some operations are indeed running on the CPU in this mode which explains the lower GPU usage. There could be further improvements in the future.

@junchen6072
Copy link
Author

I see, thanks! Is the cpu part mostly on faster-whisper, or CTranslate2?

@guillaumekln
Copy link
Contributor

It's probably a contribution of both, but I don't know exactly.

Taking the OpenAI implementation as a reference, the following lines are run on CPU in CTranslate2:

https://github.com/openai/whisper/blob/v20230314/whisper/timing.py#L208-L214

These steps could benefit from a GPU implementation but I would need some time to come up with an efficient implementation. My first attempt had worse performance than the CPU version!

@guillaumekln
Copy link
Contributor

Higher GPU usage would probably come from some form of batch execution. This is discussed in #59.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants