-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fbank features are different from Kaldi Fbank #400
Comments
|
I looked into this and took a while to figure out why. When you use See my test or existing test. This is extremely subtle. |
@mthrok - should we add documentation about this or otherwise try to prevent this issue coming up again in the future? I'm surprised we have a need for a separate load_wav to begin with. |
I second @cpuhrsch: I'm also surprised that we |
I don't believe we should rely on |
edit: After some testing it seems to get the closest match one has to do no normalisation but times by 2**15 ? @mthrok normalising audio does not help for me, code:
output
Btw I don't think it's good to make the assumption of normalising audio as you can't do this in a realtime setting. |
Hi @RuABraun As you figured out, normalization here means dtype conversion, that is According to my recent talk with @cpuhrsch, this fbank feature is not intended for precise match with the Kaldi's implementation. I found that our test suite for this function which I thought was covering it was not enough and it does not match the Kaldi's result. I personally think that it is more confusing to have a module named To lower the maintenance cost, I am in favor of building Kaldi and binding, which guarantees all the Kaldi related features to match perfectly with Kaldi's result but that opinion is not getting a support from anyone. Similar issue is raised at #328 |
Thank you for the explanation! :) |
We also had the same problem two days ago under the setting
|
Can you please share your code? it will be very useful! |
@BattashB Something like this waveform, sample_rarte = torchaudio.load(<file>) # waveform is float32, value range [-1, 1]
waveform = waveform * (2 << 16) # convert the value range to [-32,768., 32,767.] |
🐛 Bug
The output of the fbank feature calculations differs from that of kaldi.
To Reproduce
Steps to reproduce the behavior:
using the following or even the defaults parameters:
produce this output:
with compute_fbank_feats of Kaldi
The text was updated successfully, but these errors were encountered: