Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify train.py, evaluate.py, infer.py and tune.py by adding DeepSpeech2Model class for DS2. #183

Merged
merged 3 commits into from
Aug 2, 2017

Conversation

xinghai-sun
Copy link
Contributor

  1. Move functions in model.py into layer.py
  2. Add a DeepSpeech2Model class in model.py, which has train() and infer_batch() methods.
  3. Simplify train.py, evaluate.py, infer.py and tune.py。

@xinghai-sun xinghai-sun requested review from kuke and pkuyym August 1, 2017 08:31
Copy link
Collaborator

@kuke kuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Almost LGTM.

@@ -62,10 +66,10 @@
type=str,
help="Manifest path for normalizer. (default: %(default)s)")
parser.add_argument(
"--decode_manifest_path",
"--tune_manifest_path",
default='datasets/manifest.test',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

manifest.dev would be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

padding=(5, 10),
act=paddle.activation.BRelu())
output_num_channels = 32
output_height = 160 // pow(2, num_stacks) + 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we figure out a way to avoid hardcode here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be fixed in a later PR (this problem is not coming from current PR).

@@ -127,100 +129,47 @@

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to add output_model_dir into parser as an argument?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


def conv_group(input, num_stacks):
"""
Convolution group with several stacking convolution layers.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that stacked is often used instead of stacking. The same below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

for target, result in zip(target_transcripts, result_transcripts):
wer_sum += wer(target, result)
num_ins += 1
print("WER (%d/?) = %f" % (num_ins, wer_sum / num_ins))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does ? mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It refers to an unknown size of the validation set. It will become known at the end of the evaluation.

padding=(5, 10),
act=paddle.activation.BRelu())
output_num_channels = 32
output_height = 160 // pow(2, num_stacks) + 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the model is refactored, please make 160 exposed instead of hard-coding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be fixed in a later PR (this problem is not coming from current PR).


import paddle.v2 as paddle

DISABLE_CUDNN_BATCH_NORM = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make DISABLE_CUDNN_BATCH_NORM a parameter of conv_bn_layer? If hard coding here, user has to modify this file when training on cpu mode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed since the cudnn problem has been fixed.

reader=dev_batch_reader, feeding=feeding_dict)
output_model_path = os.path.join(
output_model_dir, "params.pass-%d.tar.gz" % event.pass_id)
with gzip.open(output_model_path, 'w') as f:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should make sure output_model_dir exist, the saving operation may fail if the directory is not exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# of input batch data will be induced during training.
audio_data = paddle.layer.data(
name="audio_spectrogram",
type=paddle.data_type.dense_array(161 * 161))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious about 161 * 161, is this setting proper for other type of feature like mfcc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix this in another PR. Lets' just do not bring in any difference to this PR.

def _create_network(self, vocab_size, num_conv_layers, num_rnn_layers,
rnn_layer_size):
# paddle.data_type.dense_array is used for variable batch input.
# The size 161 * 161 is only an placeholder value and the real shape
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log information about output dimensions of conv layers is induced from here, which is misleading if this placeholder is not consistent with the real shape. Maybe a better choice is to use the real dimension of feature vector.

2017-08-02 12 21 13

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix this in another PR.

Copy link
Contributor

@pkuyym pkuyym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xinghai-sun xinghai-sun merged commit 9807068 into PaddlePaddle:develop Aug 2, 2017
@xinghai-sun xinghai-sun deleted the refine_decoder2 branch August 2, 2017 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants