Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase mfcc step size instead of throwing away feature frames #1744

Closed
reuben opened this issue Nov 22, 2018 · 5 comments
Closed

Increase mfcc step size instead of throwing away feature frames #1744

reuben opened this issue Nov 22, 2018 · 5 comments

Comments

@reuben
Copy link
Contributor

reuben commented Nov 22, 2018

DeepSpeech/util/audio.py

Lines 17 to 21 in a3a96cf

# Get mfcc coefficients
features = mfcc(audio, samplerate=fs, numcep=numcep)
# We only keep every second feature (BiRNN stride = 2)
features = features[::2]

We could instead pass winlen=0.03s and winstep=0.02s to mfcc to get the same rate of feature windows over time, but without discarding any data.

@khsinclair
Copy link

khsinclair commented Nov 26, 2018

I'd recommend increasing winlen to 0.032 to match the 512-sample FFT. This avoids creating the step discontinuity when using the traditional Hamming window. That is, a 480-sample length adds 32 zeroes to create a 512-sample FFT, and the discontinuity of the raised cosine window at sample 480 produces some spectral splatter.

HOWEVER, it appears to me that line 226 of deepspeech.cc invokes feature generation with NO window function at all, and that's a serious problem. Is that indeed the usual path for audio feature generation?

Also note that any of these changes will make existing models incompatible with new audio, to various degrees. I don't know what your compatibility policy is for this.

@khsinclair
Copy link

khsinclair commented Nov 26, 2018

Without a window function, the current implementation with 0.025s length has two discontinuities in the FFT input: one at sample 400 and one at sample 512/0. Not good.

@kdavis-mozilla
Copy link
Contributor

I don't know what your compatibility policy is for this.

Currently, as we are not to 1.0, we are free to break backwards compatibility when the engine benefits.

@reuben
Copy link
Contributor Author

reuben commented Dec 10, 2018

@khsinclair thanks for the tips! I've created PR #1773 fixing this.

@lock
Copy link

lock bot commented Jan 9, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Jan 9, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants