-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the support of mfcc feature for DS2 #168
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost LGTM
deep_speech_2/README.md
Outdated
@@ -38,7 +38,13 @@ python datasets/librispeech/librispeech.py --help | |||
python compute_mean_std.py | |||
``` | |||
|
|||
`python compute_mean_std.py` computes mean and stdandard deviation for audio features, and save them to a file with a default name `./mean_std.npz`. This file will be used in both training and inferencing. | |||
`python compute_mean_std.py` computes mean and stdandard deviation for audio features, and save them to a file with a default name `./mean_std.npz`. This file will be used in both training and inferencing. The default feature of audio data is power spectrum, currently the mfcc feature is also supported. To train and infer based on mfcc feature, you can regenerate this file by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently the mfcc feature is also supported
,changing currently
to and
should be better? There's no need to tell the user that mfcc is added just now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can regenerate this file by
,why regenerate
when first running ds2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -2,3 +2,4 @@ wget==3.2 | |||
scipy==0.13.1 | |||
resampy==0.1.5 | |||
https://github.com/kpu/kenlm/archive/master.zip | |||
python_speech_features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a version number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No version number
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. Looking forward to better experimental results with MFCC.
@@ -38,7 +38,13 @@ python datasets/librispeech/librispeech.py --help | |||
python compute_mean_std.py | |||
``` | |||
|
|||
`python compute_mean_std.py` computes mean and stdandard deviation for audio features, and save them to a file with a default name `./mean_std.npz`. This file will be used in both training and inferencing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python compute_mean_std.py
computes --> "It will compute"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
deep_speech_2/README.md
Outdated
@@ -38,7 +38,13 @@ python datasets/librispeech/librispeech.py --help | |||
python compute_mean_std.py | |||
``` | |||
|
|||
`python compute_mean_std.py` computes mean and stdandard deviation for audio features, and save them to a file with a default name `./mean_std.npz`. This file will be used in both training and inferencing. | |||
`python compute_mean_std.py` computes mean and stdandard deviation for audio features, and save them to a file with a default name `./mean_std.npz`. This file will be used in both training and inferencing. The default feature of audio data is power spectrum, currently the mfcc feature is also supported. To train and infer based on mfcc feature, you can regenerate this file by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- you can regenerate --> please regenerate
2.spectrum or spectrogram ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
deep_speech_2/README.md
Outdated
python compute_mean_std.py --specgram_type mfcc | ||
``` | ||
|
||
and specify the ```specgram_type``` to ```mfcc``` in each step, including training, inference etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- “in each step, including training, inference etc.” --》 “ when running train.py, infer.py, evaluator.py or tune.py"
specgram_type
tomfcc
-->--specgram_type mfcc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Add mfcc feature for audio data, and the training of model is in progress: