Transcribe on GPU #2329

take0x · 2024-09-09T10:45:04Z

Idealy, log_mel_spectrogram() should use model.device when transcribing.

RahulVadisetty91

Consider simplifying the condition to make it more concise.

ExtReMLapin · 2024-09-20T04:56:13Z

benchmarks ?

kittsil · 2024-09-21T20:58:10Z

I am not sure that this is a valuable change.

While it is not a robust benchmark, I did do an experiment on my local machine.
10x log_mel_spectrogram() on a 177min audio file:

CPU:
  mean: 0.948
  std_dev: 0.208

GPU:
  mean: 2.67
  std_dev: 1.20

Machine specs:

CPU: i5-13500HX
GPU: GeForce RTX 4050 Laptop GPU

Note: The audio takes ~30s to load and ~330s to transcribe, so the difference of one or two seconds seems largely moot regardless.

take0x · 2024-09-22T13:29:32Z

What is important is that the device specified in load_model() should be used when transcribing, rather than the actual benchmark result.

kittsil · 2024-09-22T17:33:42Z

@take0x, I was using my GPU to transcribe.

What is important is that the device specified in load_model() should be used when transcribing, rather than the actual benchmark result.

The device specified is used to transcribe. The log_mel_spectrogram() computation, which is a preprocessing step and NOT part of the NN model, defaults to using the CPU.

I think most consumers of the code would say "the fastest device available should be used to create the mel spectrogram." Given the nature of the computation, a CPU's almost always going to be the faster device (and should therefore be the default), despite the device on which the NN (a very different computation) runs.

You're more likely to get a PR approved if it included an optional mel_spectrogram_device parameter that allows that computation to be run on a specific device, but even then... I'm not sure this has much value compared to the noise of adding another parameter.

take0x · 2024-09-22T22:43:07Z

@kittsil
Thank you for your advice.

In my case, when transcribing large amounts of audio data, there have been cases where the process crashed on the CPU but could be processed normally on the GPU. I think it would be useful to be able to transcribe using devices other than a CPU.

I'll try adding the mel_spectrogram_device parameter based on your advice.

take0x added 2 commits September 9, 2024 19:38

Transcribe on GPU

2448c6f

Merge branch 'main' into transcribe-on-gpu

834662c

RahulVadisetty91 reviewed Sep 16, 2024

View reviewed changes

Add mel_spectrogram_device parameter

c1031a5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transcribe on GPU #2329

Transcribe on GPU #2329

take0x commented Sep 9, 2024

RahulVadisetty91 left a comment

ExtReMLapin commented Sep 20, 2024

kittsil commented Sep 21, 2024

take0x commented Sep 22, 2024

kittsil commented Sep 22, 2024 •

edited

Loading

take0x commented Sep 22, 2024

Transcribe on GPU #2329

Are you sure you want to change the base?

Transcribe on GPU #2329

Conversation

take0x commented Sep 9, 2024

RahulVadisetty91 left a comment

Choose a reason for hiding this comment

ExtReMLapin commented Sep 20, 2024

kittsil commented Sep 21, 2024

take0x commented Sep 22, 2024

kittsil commented Sep 22, 2024 • edited Loading

take0x commented Sep 22, 2024

kittsil commented Sep 22, 2024 •

edited

Loading