The spectrogram computed by "torchaudio.compliance.kaldi.spectrogram" and "compute-spectrogram-feats" are different #332

yjw123456 · 2019-11-06T18:05:56Z

Dear Sir or Madam,

I compared the spectrogram generated by "torchaudio.compliance.kaldi.spectrogram" and "compute-spectrogram-feats" using default settings, however the outputs are different.

# The code for torchaudio is: 
wav, sample_rate = torchaudio.load('wav/1.wav')

spectrum=torchaudio.compliance.kaldi.spectrogram(wav, blackman_coeff=0.42, channel=-1, dither=1.0, energy_floor=0.0, frame_length=25.0, frame_shift=10.0, min_duration=0.0, preemphasis_coefficient=0.97, raw_energy=True, remove_dc_offset=True, round_to_power_of_two=True, sample_frequency=16000.0, snip_edges=True, subtract_mean=False, window_type='povey')

# Kaldi is:
compute-spectrogram-feats  --blackman_coeff=0.42 --channel=-1 --dither=1.0 --energy_floor=0.0 --frame_length=25.0 --frame_shift=10.0 --min_duration=0.0 --preemphasis_coefficient=0.97 --raw_energy=True --remove_dc_offset=True --round_to_power_of_two=True --sample_frequency=16000.0 --snip_edges=True --subtract_mean=False --window_type='povey'   scp:wav.scp ark,t,scp:1_stft.ark,1_stft.scp

Can you give me some clue of it?

Thanks!

The text was updated successfully, but these errors were encountered:

wuqiangch · 2019-11-13T05:54:47Z

@yjw123456 I have the same problem.

wuqiangch · 2019-11-13T06:01:20Z

@yjw123456 I have the same problem.

vincentqb · 2019-11-18T22:12:56Z

The parameter given for sample_frequency to torchaudio.compliance.kaldi.spectrogram must be the sample rate for the audio file, sample_rate. Can you confirm that sample_rate is 16000 ?

Can you provide the whole code for kaldi so I can reproduce?

Here's an audio file to try with.

yjw123456 · 2019-11-27T07:24:36Z

Hi, vincentqb,

Sorry for the late reply. @vincentqb @wuqiangch

I found the answer of this problem.
The waveform loaded by torchaudio is transformed into float type. While kaldi directly uses integers to calculate the spectrogram.

So if I do:
wav, sample_rate = torchaudio.load('tmp.wav')
wav = wav*2**15
Then the outputs will be exactly the same.

The whole code in kaldi is:
compute-spectrogram-feats --blackman_coeff=0.42 --channel=-1 --dither=1.0 --energy_floor=0.0 --frame_length=25.0 --frame_shift=10.0 --min_duration=0.0 --preemphasis_coefficient=0.97 --raw_energy=True --remove_dc_offset=True --round_to_power_of_two=True --sample_frequency=16000.0 --snip_edges=True --subtract_mean=False --window_type='povey' scp:wav.scp ark,t,scp:1_stft.ark,1_stft.scp

In the wav.scp is:
utt-id tmp.wav

vincentqb · 2019-11-27T14:56:31Z

Glad this answers your question. Yes, torchaudio normalizes waveforms to [-1, 1]. I will close this issue, but please feel free to reopen or create a new one.

vincentqb self-assigned this Nov 18, 2019

vincentqb closed this as completed Nov 27, 2019

vincentqb mentioned this issue Jan 13, 2020

Fbank features are different from Kaldi Fbank #400

Open

mthrok mentioned this issue Feb 15, 2021

RFC: The future of Kaldi compliance module #1269

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The spectrogram computed by "torchaudio.compliance.kaldi.spectrogram" and "compute-spectrogram-feats" are different #332

The spectrogram computed by "torchaudio.compliance.kaldi.spectrogram" and "compute-spectrogram-feats" are different #332

yjw123456 commented Nov 6, 2019 •

edited by vincentqb

Loading

wuqiangch commented Nov 13, 2019

wuqiangch commented Nov 13, 2019

vincentqb commented Nov 18, 2019

yjw123456 commented Nov 27, 2019 •

edited

Loading

vincentqb commented Nov 27, 2019

The spectrogram computed by "torchaudio.compliance.kaldi.spectrogram" and "compute-spectrogram-feats" are different #332

The spectrogram computed by "torchaudio.compliance.kaldi.spectrogram" and "compute-spectrogram-feats" are different #332

Comments

yjw123456 commented Nov 6, 2019 • edited by vincentqb Loading

wuqiangch commented Nov 13, 2019

wuqiangch commented Nov 13, 2019

vincentqb commented Nov 18, 2019

yjw123456 commented Nov 27, 2019 • edited Loading

vincentqb commented Nov 27, 2019

yjw123456 commented Nov 6, 2019 •

edited by vincentqb

Loading

yjw123456 commented Nov 27, 2019 •

edited

Loading