Skip to content

Problems with compute_deltas #336

@mravanelli

Description

@mravanelli

Hi,
I found some weird behaviour in the compute_deltas function when adding multiple channels. I'm not sure it is a bug, maybe I just expect this function to work in a way different to the actual one.
As you can see from the minimal example below, the derivatives are different when I switch from 1 to 2 channels. I think the channels should be processed independently, while here there is a kind of interaction.

from torchaudio.functional import compute_deltas
import torch

# single channel audio
spectrogram=torch.rand(1,20,75)
deltas=compute_deltas(spectrogram)

print(deltas)
print(deltas.shape)


# audio with 2 channels: the signals are the same 
# and the derivatives should be the same on the different channels.

spectrogram=torch.cat([spectrogram,spectrogram])
deltas=compute_deltas(spectrogram)

print(deltas[0,:])
print(deltas[1,:])
print(deltas.shape)

# the derivatives are different now. This should't happen because the channels must be processed independently

According to my understanding the issue is that the function compute_deltas uses in some points specgram.shape[0], where dim=0 is the channel dimension (i.e. the batch dimension). The problem can be simply fixed by replacing all "specgram.shape[0]" with 1. I noticed this issue because the performance of my speech recognizer didn't improve when adding the derivatives of the MFCC coefficients. With this fix everything works as expected.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions