Problems with compute_deltas

Hi,
I found some weird behaviour in the compute_deltas function when adding multiple channels. I'm not sure it is a bug, maybe I just expect this function to work in a way different to the actual one.
As you can see from the minimal example below, the derivatives are different when I switch from 1 to 2 channels. I think the channels should be processed independently, while here there is a kind of interaction. 


```
from torchaudio.functional import compute_deltas
import torch

# single channel audio
spectrogram=torch.rand(1,20,75)
deltas=compute_deltas(spectrogram)

print(deltas)
print(deltas.shape)


# audio with 2 channels: the signals are the same 
# and the derivatives should be the same on the different channels.

spectrogram=torch.cat([spectrogram,spectrogram])
deltas=compute_deltas(spectrogram)

print(deltas[0,:])
print(deltas[1,:])
print(deltas.shape)

# the derivatives are different now. This should't happen because the channels must be processed independently
```

According to my understanding the issue is that the function compute_deltas uses in some points specgram.shape[0], where dim=0 is the channel dimension (i.e. the batch dimension). The problem can be simply fixed by replacing all "specgram.shape[0]" with 1.  I noticed this issue because the performance of my speech recognizer didn't improve when adding the derivatives of the MFCC coefficients. With this fix everything works as expected.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Problems with compute_deltas #336

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Problems with compute_deltas #336

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions