-
Notifications
You must be signed in to change notification settings - Fork 704
Description
Hi,
I found some weird behaviour in the compute_deltas function when adding multiple channels. I'm not sure it is a bug, maybe I just expect this function to work in a way different to the actual one.
As you can see from the minimal example below, the derivatives are different when I switch from 1 to 2 channels. I think the channels should be processed independently, while here there is a kind of interaction.
from torchaudio.functional import compute_deltas
import torch
# single channel audio
spectrogram=torch.rand(1,20,75)
deltas=compute_deltas(spectrogram)
print(deltas)
print(deltas.shape)
# audio with 2 channels: the signals are the same
# and the derivatives should be the same on the different channels.
spectrogram=torch.cat([spectrogram,spectrogram])
deltas=compute_deltas(spectrogram)
print(deltas[0,:])
print(deltas[1,:])
print(deltas.shape)
# the derivatives are different now. This should't happen because the channels must be processed independently
According to my understanding the issue is that the function compute_deltas uses in some points specgram.shape[0], where dim=0 is the channel dimension (i.e. the batch dimension). The problem can be simply fixed by replacing all "specgram.shape[0]" with 1. I noticed this issue because the performance of my speech recognizer didn't improve when adding the derivatives of the MFCC coefficients. With this fix everything works as expected.