Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blank zeroed rows in Mel spectrograms #851

Closed
duvtedudug opened this issue Aug 4, 2020 · 3 comments
Closed

Blank zeroed rows in Mel spectrograms #851

duvtedudug opened this issue Aug 4, 2020 · 3 comments
Labels

Comments

@duvtedudug
Copy link

🐛 Bug

Mel spectrogram have some random rows that are all zeros. Normal spectrograms are fine.

To Reproduce

Steps to reproduce the behavior:

  1. mel = torchaudio.transforms.MelSpectrogram()(waveform)

or

  1. spec = torchaudio.transforms.Spectrogram()(waveform)
  2. mel = torchaudio.transforms.MelScale()(spec)

Both give something like this (white lines are zero valued)

Screenshot 2020-08-04 at 07 24 27

Expected behavior

All rows to have data!

Environment

  • Tried both pytorch 1.5 and 1.6
@mthrok
Copy link
Collaborator

mthrok commented Aug 6, 2020

Hi @duvtedudug

Thanks for reporting the issue. Can you provide us how you load audio file? Also, do you know a publicly accessible audio data that causes the same issue, or can you share your data?

@mthrok mthrok added the bug label Aug 6, 2020
@duvtedudug
Copy link
Author

After more exploration the cause was having not enough n_fft bins in the spectrogram to get the default 128 Mel bins.

I was following the code in the tutorial

Here's my code...

filename = "my_audio.wav"
waveform, sample_rate = torchaudio.load(filename)

spec = torchaudio.transforms.Spectrogram()(waveform)

# But if I do this it works...
# spec = torchaudio.transforms.Spectrogram(n_fft=1024)(waveform)

plt.figure()
plt.imshow(torch.log2(spec[0,:,:]).numpy(), cmap='gray', origin='lower')

mel = torchaudio.transforms.MelScale()(spec)

plt.figure()
p = plt.imshow(torch.log2(mel[0,:,:]).numpy(), cmap='gray', origin='lower')

Just adding more resolution to the spectrogram seems to give the Mel conversion what it needs!

My only suggestion is to change the default n_fft values (or at least the values in the tutorial) so that it works out-of-the-box?

Thanks for your time!

@vincentqb
Copy link
Contributor

After more exploration the cause was having not enough n_fft bins in the spectrogram to get the default 128 Mel bins.

I was following the code in the tutorial

Here's my code...

filename = "my_audio.wav"
waveform, sample_rate = torchaudio.load(filename)

spec = torchaudio.transforms.Spectrogram()(waveform)

# But if I do this it works...
# spec = torchaudio.transforms.Spectrogram(n_fft=1024)(waveform)

plt.figure()
plt.imshow(torch.log2(spec[0,:,:]).numpy(), cmap='gray', origin='lower')

mel = torchaudio.transforms.MelScale()(spec)

plt.figure()
p = plt.imshow(torch.log2(mel[0,:,:]).numpy(), cmap='gray', origin='lower')

Just adding more resolution to the spectrogram seems to give the Mel conversion what it needs!

My only suggestion is to change the default n_fft values (or at least the values in the tutorial) so that it works out-of-the-box?

Thanks for your time!

Thanks for the suggestion :) There was also a discussion in #384 about using powers of 2 instead. (I'm also noting that the default was changed in #83.)

If you use master, do you now get a warning about zero Mel filters if you don't specify n_fft? The warning was introduced in #914.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants