-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify train.py, evaluate.py, infer.py and tune.py by adding DeepSpeech2Model class for DS2. #183
Conversation
xinghai-sun
commented
Aug 1, 2017
- Move functions in model.py into layer.py
- Add a DeepSpeech2Model class in model.py, which has train() and infer_batch() methods.
- Simplify train.py, evaluate.py, infer.py and tune.py。
…infer.py for DS2.
…eech2Model class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Almost LGTM.
deep_speech_2/tune.py
Outdated
@@ -62,10 +66,10 @@ | |||
type=str, | |||
help="Manifest path for normalizer. (default: %(default)s)") | |||
parser.add_argument( | |||
"--decode_manifest_path", | |||
"--tune_manifest_path", | |||
default='datasets/manifest.test', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
manifest.dev
would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
padding=(5, 10), | ||
act=paddle.activation.BRelu()) | ||
output_num_channels = 32 | ||
output_height = 160 // pow(2, num_stacks) + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we figure out a way to avoid hardcode here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be fixed in a later PR (this problem is not coming from current PR).
@@ -127,100 +129,47 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to add output_model_dir
into parser
as an argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
deep_speech_2/layer.py
Outdated
|
||
def conv_group(input, num_stacks): | ||
""" | ||
Convolution group with several stacking convolution layers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that stacked
is often used instead of stacking
. The same below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
for target, result in zip(target_transcripts, result_transcripts): | ||
wer_sum += wer(target, result) | ||
num_ins += 1 | ||
print("WER (%d/?) = %f" % (num_ins, wer_sum / num_ins)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does ?
mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It refers to an unknown size of the validation set. It will become known at the end of the evaluation.
padding=(5, 10), | ||
act=paddle.activation.BRelu()) | ||
output_num_channels = 32 | ||
output_height = 160 // pow(2, num_stacks) + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the model is refactored, please make 160
exposed instead of hard-coding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be fixed in a later PR (this problem is not coming from current PR).
deep_speech_2/layer.py
Outdated
|
||
import paddle.v2 as paddle | ||
|
||
DISABLE_CUDNN_BATCH_NORM = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not make DISABLE_CUDNN_BATCH_NORM
a parameter of conv_bn_layer
? If hard coding here, user has to modify this file when training on cpu
mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed since the cudnn problem has been fixed.
reader=dev_batch_reader, feeding=feeding_dict) | ||
output_model_path = os.path.join( | ||
output_model_dir, "params.pass-%d.tar.gz" % event.pass_id) | ||
with gzip.open(output_model_path, 'w') as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should make sure output_model_dir
exist, the saving operation may fail if the directory is not exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
# of input batch data will be induced during training. | ||
audio_data = paddle.layer.data( | ||
name="audio_spectrogram", | ||
type=paddle.data_type.dense_array(161 * 161)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious about 161 * 161
, is this setting proper for other type of feature like mfcc
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix this in another PR. Lets' just do not bring in any difference to this PR.
deep_speech_2/model.py
Outdated
def _create_network(self, vocab_size, num_conv_layers, num_rnn_layers, | ||
rnn_layer_size): | ||
# paddle.data_type.dense_array is used for variable batch input. | ||
# The size 161 * 161 is only an placeholder value and the real shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix this in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM