Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The spectrogram computed by "torchaudio.compliance.kaldi.spectrogram" and "compute-spectrogram-feats" are different #332

Closed
yjw123456 opened this issue Nov 6, 2019 · 5 comments
Assignees

Comments

@yjw123456
Copy link

yjw123456 commented Nov 6, 2019

Dear Sir or Madam,

I compared the spectrogram generated by "torchaudio.compliance.kaldi.spectrogram" and "compute-spectrogram-feats" using default settings, however the outputs are different.

# The code for torchaudio is: 
wav, sample_rate = torchaudio.load('wav/1.wav')

spectrum=torchaudio.compliance.kaldi.spectrogram(wav, blackman_coeff=0.42, channel=-1, dither=1.0, energy_floor=0.0, frame_length=25.0, frame_shift=10.0, min_duration=0.0, preemphasis_coefficient=0.97, raw_energy=True, remove_dc_offset=True, round_to_power_of_two=True, sample_frequency=16000.0, snip_edges=True, subtract_mean=False, window_type='povey')
# Kaldi is:
compute-spectrogram-feats  --blackman_coeff=0.42 --channel=-1 --dither=1.0 --energy_floor=0.0 --frame_length=25.0 --frame_shift=10.0 --min_duration=0.0 --preemphasis_coefficient=0.97 --raw_energy=True --remove_dc_offset=True --round_to_power_of_two=True --sample_frequency=16000.0 --snip_edges=True --subtract_mean=False --window_type='povey'   scp:wav.scp ark,t,scp:1_stft.ark,1_stft.scp

Can you give me some clue of it?

Thanks!

@wuqiangch
Copy link

@yjw123456 I have the same problem.

1 similar comment
@wuqiangch
Copy link

@yjw123456 I have the same problem.

@vincentqb
Copy link
Contributor

The parameter given for sample_frequency to torchaudio.compliance.kaldi.spectrogram must be the sample rate for the audio file, sample_rate. Can you confirm that sample_rate is 16000 ?

Can you provide the whole code for kaldi so I can reproduce?

Here's an audio file to try with.

@vincentqb vincentqb self-assigned this Nov 18, 2019
@yjw123456
Copy link
Author

yjw123456 commented Nov 27, 2019

Hi, vincentqb,

Sorry for the late reply. @vincentqb @wuqiangch

I found the answer of this problem.
The waveform loaded by torchaudio is transformed into float type. While kaldi directly uses integers to calculate the spectrogram.

So if I do:
wav, sample_rate = torchaudio.load('tmp.wav')
wav = wav*2**15
Then the outputs will be exactly the same.

The whole code in kaldi is:
compute-spectrogram-feats --blackman_coeff=0.42 --channel=-1 --dither=1.0 --energy_floor=0.0 --frame_length=25.0 --frame_shift=10.0 --min_duration=0.0 --preemphasis_coefficient=0.97 --raw_energy=True --remove_dc_offset=True --round_to_power_of_two=True --sample_frequency=16000.0 --snip_edges=True --subtract_mean=False --window_type='povey' scp:wav.scp ark,t,scp:1_stft.ark,1_stft.scp

In the wav.scp is:
utt-id tmp.wav

@vincentqb
Copy link
Contributor

Glad this answers your question. Yes, torchaudio normalizes waveforms to [-1, 1]. I will close this issue, but please feel free to reopen or create a new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants