Why n_fft=400 by default in Transforms? #384

keunwoochoi · 2019-12-29T18:11:58Z

In MelSpectrogram, Spectrogram, GriffinLim, n_fft defaults to 400. Is there a reason for not setting it with a power of 2?

The text was updated successfully, but these errors were encountered:

faroit · 2019-12-29T18:33:18Z

I guess this is coming from old days of (time domain) speech processing where the sample rate is 8kHz and 50ms are considered to be a good window size. Since 512 is also common, changing the default could indeed make sense...

vincentqb · 2020-01-02T20:33:24Z

The current value has been around for a while, and hasn't changed since this would be BC-breaking. I'd be ok changing it, if there is a strong reason to do so. Thoughts?

keunwoochoi · 2020-01-02T23:06:17Z

I see. I have no benchmark data, but thought the backed fft operation could be more efficient with `n_fft=2**N. But I don't have a strong opinion now - after googling I realized non-power-of-two fft could be efficient too :)

vincentqb · 2020-01-03T16:49:58Z

Quick run, and I don't see significant differences :)

In [1]: f = "steam-train-whistle-daniel_simon.wav"                                                                                                                                                          

In [2]: import torchaudio                                                                                                                                                                                   

In [3]: w, s = torchaudio.load(f)                                                                                                                                                                           

In [4]: %timeit torchaudio.transforms.Spectrogram(n_fft=400)(w)                                                                                                                                             
10.5 ms ± 376 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [5]: %timeit torchaudio.transforms.Spectrogram(n_fft=512)(w)                                                                                                                                             
11.1 ms ± 207 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

vincentqb · 2020-01-03T20:33:35Z

I will close this issue for now. Please feel free to re-open if there are more elements you would like to add to the discussion :)

PetrochukM · 2020-03-03T18:55:25Z

@faroit What would you consider a better window size today? In the recent Tacotron paper, they also used a 50-millisecond frame size; however, the Kaldi spectrogram recommends a 25-millisecond frame size. From my online readings, it sounds like 20 - 30 milliseconds is recommended for a text-to-speech application with a 50% hop length.

faroit · 2020-03-10T10:45:55Z

@PetrochukM Yes, maybe 32ms (fft = 512) would be a better fit with respect to performance as pointed out by @keunwoochoi

some improvements to the make download(pytorch#384)

vincentqb self-assigned this Jan 2, 2020

vincentqb closed this as completed Jan 3, 2020

vincentqb mentioned this issue Oct 14, 2020

Blank zeroed rows in Mel spectrograms #851

Closed

mthrok pushed a commit to mthrok/audio that referenced this issue Feb 26, 2021

some improvements to the make download(pytorch#384)

b037aef

mthrok pushed a commit to mthrok/audio that referenced this issue Feb 26, 2021

Merge pull request pytorch#390 from 9bow/master

b38343e

some improvements to the make download(pytorch#384)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why n_fft=400 by default in Transforms? #384

Why n_fft=400 by default in Transforms? #384

keunwoochoi commented Dec 29, 2019

faroit commented Dec 29, 2019 •

edited

Loading

vincentqb commented Jan 2, 2020

keunwoochoi commented Jan 2, 2020

vincentqb commented Jan 3, 2020

vincentqb commented Jan 3, 2020

PetrochukM commented Mar 3, 2020

faroit commented Mar 10, 2020

Why n_fft=400 by default in Transforms? #384

Why n_fft=400 by default in Transforms? #384

Comments

keunwoochoi commented Dec 29, 2019

faroit commented Dec 29, 2019 • edited Loading

vincentqb commented Jan 2, 2020

keunwoochoi commented Jan 2, 2020

vincentqb commented Jan 3, 2020

vincentqb commented Jan 3, 2020

PetrochukM commented Mar 3, 2020

faroit commented Mar 10, 2020

faroit commented Dec 29, 2019 •

edited

Loading