Blank zeroed rows in Mel spectrograms #851

duvtedudug · 2020-08-04T08:03:28Z

🐛 Bug

Mel spectrogram have some random rows that are all zeros. Normal spectrograms are fine.

To Reproduce

Steps to reproduce the behavior:

mel = torchaudio.transforms.MelSpectrogram()(waveform)

or

spec = torchaudio.transforms.Spectrogram()(waveform)
mel = torchaudio.transforms.MelScale()(spec)

Both give something like this (white lines are zero valued)

Expected behavior

All rows to have data!

Environment

Tried both pytorch 1.5 and 1.6

The text was updated successfully, but these errors were encountered:

mthrok · 2020-08-06T20:19:27Z

Hi @duvtedudug

Thanks for reporting the issue. Can you provide us how you load audio file? Also, do you know a publicly accessible audio data that causes the same issue, or can you share your data?

duvtedudug · 2020-08-13T08:00:49Z

After more exploration the cause was having not enough n_fft bins in the spectrogram to get the default 128 Mel bins.

I was following the code in the tutorial

Here's my code...

filename = "my_audio.wav"
waveform, sample_rate = torchaudio.load(filename)

spec = torchaudio.transforms.Spectrogram()(waveform)

# But if I do this it works...
# spec = torchaudio.transforms.Spectrogram(n_fft=1024)(waveform)

plt.figure()
plt.imshow(torch.log2(spec[0,:,:]).numpy(), cmap='gray', origin='lower')

mel = torchaudio.transforms.MelScale()(spec)

plt.figure()
p = plt.imshow(torch.log2(mel[0,:,:]).numpy(), cmap='gray', origin='lower')

Just adding more resolution to the spectrogram seems to give the Mel conversion what it needs!

My only suggestion is to change the default n_fft values (or at least the values in the tutorial) so that it works out-of-the-box?

Thanks for your time!

vincentqb · 2020-10-14T15:56:46Z

After more exploration the cause was having not enough n_fft bins in the spectrogram to get the default 128 Mel bins.

I was following the code in the tutorial

Here's my code...
filename = "my_audio.wav"
waveform, sample_rate = torchaudio.load(filename)

spec = torchaudio.transforms.Spectrogram()(waveform)

# But if I do this it works...
# spec = torchaudio.transforms.Spectrogram(n_fft=1024)(waveform)

plt.figure()
plt.imshow(torch.log2(spec[0,:,:]).numpy(), cmap='gray', origin='lower')

mel = torchaudio.transforms.MelScale()(spec)

plt.figure()
p = plt.imshow(torch.log2(mel[0,:,:]).numpy(), cmap='gray', origin='lower')
Just adding more resolution to the spectrogram seems to give the Mel conversion what it needs!

My only suggestion is to change the default n_fft values (or at least the values in the tutorial) so that it works out-of-the-box?

Thanks for your time!

Thanks for the suggestion :) There was also a discussion in #384 about using powers of 2 instead. (I'm also noting that the default was changed in #83.)

If you use master, do you now get a warning about zero Mel filters if you don't specify n_fft? The warning was introduced in #914.

mthrok added the bug label Aug 6, 2020

mthrok mentioned this issue Jun 16, 2021

[BC-Breaking] Remove lazy behavior from MelScale #1571

Closed

mthrok closed this as completed Jul 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blank zeroed rows in Mel spectrograms #851

Blank zeroed rows in Mel spectrograms #851

duvtedudug commented Aug 4, 2020

mthrok commented Aug 6, 2020

duvtedudug commented Aug 13, 2020

vincentqb commented Oct 14, 2020

Blank zeroed rows in Mel spectrograms #851

Blank zeroed rows in Mel spectrograms #851

Comments

duvtedudug commented Aug 4, 2020

🐛 Bug

To Reproduce

Expected behavior

Environment

mthrok commented Aug 6, 2020

duvtedudug commented Aug 13, 2020

vincentqb commented Oct 14, 2020