Massive slowdown in 3.0.0 version from 2.1.1 #1523

mochan-b · 2023-11-02T05:04:26Z

I've been using Version 2.1.1 and it would process 1 hour of audio around 15 minutes or so.

In version 3.0.1, it's been more than 1.5 hours and its still not done for 1 hour of audio.

In both cases I've been using pipeline.to(torch.device("cuda"))

So, I'm not using 2.1.1 on PyPi but with the to function added that was pip installed from github.

I did some basic investigation only. It seems like GPU utilization is lower and CPU utilization is one single core only.

Before, there was more CPU utilization in more cores.

Profiling shows most time spent on torchaudio functions. There is new message about backends on torchaudio. Could that be a cause?

Anything else I can look for to narrow down where the performance problem lies?

My code is essentially this. I tried both mp3 and wav audio_file for this.

pipeline = Pipeline.from_pretrained('pyannote/speaker-diarization',
                                        use_auth_token=HUGGING_FACE_API_KEY)
    pipeline.to(torch.device("cuda"))
    diarization = pipeline(audio_file)

The text was updated successfully, but these errors were encountered:

github-actions · 2023-11-02T05:04:45Z

Thank you for your issue.You might want to check the FAQ if you haven't done so already.

Feel free to close this issue if you found an answer in the FAQ.

If your issue is a feature request, please read this first and update your request accordingly, if needed.

If your issue is a bug report, please provide a minimum reproducible example as a link to a self-contained Google Colab notebook containing everthing needed to reproduce the bug:

installation
data preparation
model download
etc.

Providing an MRE will increase your chance of getting an answer from the community (either maintainers or other power users).

Companies relying on pyannote.audio in production may contact me via email regarding:

paid scientific consulting around speaker diarization and speech processing in general;
custom models and tailored features (via the local tech transfer office).

This is an automated reply, generated by FAQtory

hbredin · 2023-11-02T09:46:57Z

Try loading the audio first maybe?

from pyannote.audio import Audio
io = Audio(mono='downmix', sample_rate=16000)
waveform, sample_rate = io(audio_file)

diarization = pipeline({"waveform": waveform, "sample_rate": sample_rate})

grazder · 2023-11-02T11:46:14Z

I've got the same problem. I've tried to replace onnx providers for embedder model, and it gave me full GPU utilization, but pipeline started to work slower. So in my opinion bottleneck here isn't model, may be it's croping audio or resampling or smth else.

I load audio first and also tried to resample it before providing audio into model.

You can also increase batch sizes, it gave speed up

martinkallstrom · 2023-11-02T11:55:05Z

Also have the same problem, working on it as we speak. Resampling from 48000 to 16000 hz gives a little speed bump but nothing remarkable (~10% faster). Trying to move the waveform to gpu with waveform = waveform.to(torch.device("cuda")) results in an exception in the speaker verification step:

  File "/workspace/reason-api/diarize.py", line 75, in __from_audio
    self.annotation, self.embeddings = pipeline(audio, return_embeddings=True, min_speakers=1, max_speakers=3)
  File "/workspace/reason-api/venv/lib/python3.10/site-packages/pyannote/audio/core/pipeline.py", line 325, in __call__
    return self.apply(file, **kwargs)
  File "/workspace/reason-api/venv/lib/python3.10/site-packages/pyannote/audio/pipelines/speaker_diarization.py", line 512, in apply
    embeddings = self.get_embeddings(
  File "/workspace/reason-api/venv/lib/python3.10/site-packages/pyannote/audio/pipelines/speaker_diarization.py", line 344, in get_embeddings
    embedding_batch: np.ndarray = self._embedding(
  File "/workspace/reason-api/venv/lib/python3.10/site-packages/pyannote/audio/pipelines/speaker_verification.py", line 609, in __call__
    input_feed={"feats": masked_feature.numpy()[None]},
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.```

grazder · 2023-11-02T11:57:14Z

Yeah, you don't need to transfer waveform to cuda, because model inferences with ONNX runtime, which requires numpy array as input

martinkallstrom · 2023-11-02T12:17:35Z

What then could be the cause of the slow inference? Diarization of 38 seconds of audio takes ~10s, even if done repetitively and measured when models should be loaded into GPU. Which is a surprisingly high real-time factor.

grazder · 2023-11-02T12:48:03Z

I can't identify exact reason right now but I see problems with batching with for

Here

pyannote-audio/pyannote/audio/pipelines/speaker_verification.py

Line 544 in 0b45103

features = torch.stack(

and here

pyannote-audio/pyannote/audio/pipelines/speaker_verification.py

Line 602 in 0b45103

for f, (feature, imask) in enumerate(zip(features, imasks)):

I also right now tried to add .cuda() to input waveform.
And added .cpu() here

pyannote-audio/pyannote/audio/pipelines/speaker_verification.py

Line 609 in 0b45103

input_feed={"feats": masked_feature.numpy()[None]},

And I got a huge increase in speed. Now 204 seconds of audio inferences in 2 seconds instead of 28 seconds previously.

grazder · 2023-11-02T12:50:30Z

So IMO we lose speed in feature generation on CPU

grazder · 2023-11-02T12:54:03Z

@hbredin Are you planning to change this batch processing scheme? For model i think you need to adapt original wespeaker-voxceleb-resnet34-LM, so it will take masks on input.

For feature generation I don't see any options for masking or batching in doc

You can try different feature generation with batching available I guess

grazder · 2023-11-02T13:41:13Z

Also found out that simple .cuda() cast won't work here

pyannote-audio/pyannote/audio/pipelines/speaker_verification.py

Line 581 in 0b45103

features = self.compute_fbank(waveforms)

This doesn't give as much of a speed boost as casting audio to the GPU on input to the pipeline.
So, there are some problems somewhere else

mochan-b · 2023-11-03T08:28:10Z

Try loading the audio first maybe?

from pyannote.audio import Audio
io = Audio(mono='downmix', sample_rate=16000)
waveform, sample_rate = io(audio_file)

diarization = pipeline({"waveform": waveform, "sample_rate": sample_rate})

Thank you that solved my problem.

In my test file of 5 minutes, it was first taking 95 seconds with 3.0.1. With 2.1.1, it was taking 9 seconds.

After passing the down-sampled mono waveform, it is now processing it in 6.7 seconds.

jocastrocUnal · 2023-11-06T19:38:01Z

Try loading the audio first maybe?

from pyannote.audio import Audio
io = Audio(mono='downmix', sample_rate=16000)
waveform, sample_rate = io(audio_file)

diarization = pipeline({"waveform": waveform, "sample_rate": sample_rate})

This was my solution also. Thanks

hbredin · 2023-11-09T12:04:10Z

FYI: #1537

hbredin · 2023-11-16T13:02:19Z

Latest version no longer relies on ONNX runtime.
Please update to pyannote.audio 3.1 and pyannote/speaker-diarization-3.1 (and open new issues if needed).

mochan-b closed this as completed Nov 3, 2023

This was referenced Nov 7, 2023

fix: compute fbank on selected device #1529

Merged

pyannote/speaker-diarization-3.0 runs slower than pyannote/speaker-diarization@2.1 m-bain/whisperX#499

Closed

hbredin mentioned this issue Nov 9, 2023

Get rid of ONNX WeSpeaker in favor of its pytorch implementation #1537

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Massive slowdown in 3.0.0 version from 2.1.1 #1523

Massive slowdown in 3.0.0 version from 2.1.1 #1523

mochan-b commented Nov 2, 2023 •

edited

Loading

github-actions bot commented Nov 2, 2023

hbredin commented Nov 2, 2023

grazder commented Nov 2, 2023 •

edited

Loading

martinkallstrom commented Nov 2, 2023 •

edited

Loading

grazder commented Nov 2, 2023

martinkallstrom commented Nov 2, 2023

grazder commented Nov 2, 2023 •

edited

Loading

grazder commented Nov 2, 2023

grazder commented Nov 2, 2023 •

edited

Loading

grazder commented Nov 2, 2023 •

edited

Loading

mochan-b commented Nov 3, 2023

jocastrocUnal commented Nov 6, 2023

hbredin commented Nov 9, 2023

hbredin commented Nov 16, 2023

Massive slowdown in 3.0.0 version from 2.1.1 #1523

Massive slowdown in 3.0.0 version from 2.1.1 #1523

Comments

mochan-b commented Nov 2, 2023 • edited Loading

github-actions bot commented Nov 2, 2023

hbredin commented Nov 2, 2023

grazder commented Nov 2, 2023 • edited Loading

martinkallstrom commented Nov 2, 2023 • edited Loading

grazder commented Nov 2, 2023

martinkallstrom commented Nov 2, 2023

grazder commented Nov 2, 2023 • edited Loading

grazder commented Nov 2, 2023

grazder commented Nov 2, 2023 • edited Loading

grazder commented Nov 2, 2023 • edited Loading

mochan-b commented Nov 3, 2023

jocastrocUnal commented Nov 6, 2023

hbredin commented Nov 9, 2023

hbredin commented Nov 16, 2023

mochan-b commented Nov 2, 2023 •

edited

Loading

grazder commented Nov 2, 2023 •

edited

Loading

martinkallstrom commented Nov 2, 2023 •

edited

Loading

grazder commented Nov 2, 2023 •

edited

Loading

grazder commented Nov 2, 2023 •

edited

Loading

grazder commented Nov 2, 2023 •

edited

Loading