-
-
Notifications
You must be signed in to change notification settings - Fork 804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Massive slowdown in 3.0.0 version from 2.1.1 #1523
Comments
Thank you for your issue.You might want to check the FAQ if you haven't done so already. Feel free to close this issue if you found an answer in the FAQ. If your issue is a feature request, please read this first and update your request accordingly, if needed. If your issue is a bug report, please provide a minimum reproducible example as a link to a self-contained Google Colab notebook containing everthing needed to reproduce the bug:
Providing an MRE will increase your chance of getting an answer from the community (either maintainers or other power users). Companies relying on
|
Try loading the audio first maybe? from pyannote.audio import Audio
io = Audio(mono='downmix', sample_rate=16000)
waveform, sample_rate = io(audio_file)
diarization = pipeline({"waveform": waveform, "sample_rate": sample_rate}) |
I've got the same problem. I've tried to replace onnx providers for embedder model, and it gave me full GPU utilization, but pipeline started to work slower. So in my opinion bottleneck here isn't model, may be it's croping audio or resampling or smth else. I load audio first and also tried to resample it before providing audio into model. You can also increase batch sizes, it gave speed up |
Also have the same problem, working on it as we speak. Resampling from 48000 to 16000 hz gives a little speed bump but nothing remarkable (~10% faster). Trying to move the waveform to gpu with
|
Yeah, you don't need to transfer waveform to cuda, because model inferences with ONNX runtime, which requires numpy array as input |
What then could be the cause of the slow inference? Diarization of 38 seconds of audio takes ~10s, even if done repetitively and measured when models should be loaded into GPU. Which is a surprisingly high real-time factor. |
I can't identify exact reason right now but I see problems with batching with Here
and here
I also right now tried to add
And I got a huge increase in speed. Now 204 seconds of audio inferences in 2 seconds instead of 28 seconds previously. |
So IMO we lose speed in feature generation on CPU |
@hbredin Are you planning to change this batch processing scheme? For model i think you need to adapt original For feature generation I don't see any options for masking or batching in doc You can try different feature generation with batching available I guess |
Also found out that simple
This doesn't give as much of a speed boost as casting audio to the GPU on input to the pipeline. |
Thank you that solved my problem. In my test file of 5 minutes, it was first taking 95 seconds with 3.0.1. With 2.1.1, it was taking 9 seconds. After passing the down-sampled mono waveform, it is now processing it in 6.7 seconds. |
This was my solution also. Thanks |
FYI: #1537 |
Latest version no longer relies on ONNX runtime. |
I've been using Version 2.1.1 and it would process 1 hour of audio around 15 minutes or so.
In version 3.0.1, it's been more than 1.5 hours and its still not done for 1 hour of audio.
In both cases I've been using
pipeline.to(torch.device("cuda"))
So, I'm not using 2.1.1 on
PyPi
but with theto
function added that was pip installed from github.I did some basic investigation only. It seems like GPU utilization is lower and CPU utilization is one single core only.
Before, there was more CPU utilization in more cores.
Profiling shows most time spent on torchaudio functions. There is new message about backends on torchaudio. Could that be a cause?
Anything else I can look for to narrow down where the performance problem lies?
My code is essentially this. I tried both mp3 and wav audio_file for this.
The text was updated successfully, but these errors were encountered: