Add doc for mandarin lm. #298

pkuyym · 2017-09-19T14:04:18Z

kuke

Almost LGTM

kuke · 2017-09-20T02:45:36Z

deep_speech_2/README.md

+  * English punctuations and chinese punctuations are removed.
+  * Insert a whitespace character between two tokens.
+
+Please notice that the released language model only contains chinese simplified characters. When preprocessing done we can begin to train the language model. The key training parameters are '-o 5 --prune 0 1 2 4 4'. Please refer above section for the meaning of each parameter. We also convert the arpa file to binary file using default settings.


chinese-->Chinese
When --> After
parameters/parameters-->arguments/argument

kuke · 2017-09-20T02:51:19Z

deep_speech_2/README.md

+
+  * The beginning and trailing whitespace characters are removed.
+  * English punctuations and chinese punctuations are removed.
+  * Insert a whitespace character between two tokens.


Insert a whitespace character between two tokens. --> A whitespace character between two tokens is inserted. for consistence.

kuke · 2017-09-20T02:56:26Z

deep_speech_2/README.md

-TODO: any other requirements or tips to add?
+#### Mandarin LM
+
+Different from word-based language model, mandarin language model is character-based where each token is a chinese character. We use an internal corpus to train the released mandarin language model. This corpus contains billions of tokens. The preprocessing has small difference from english language model and all steps are:


Different from word-based language model-->Different from English language model
english-->English
chinese-->Chinese
small-->tiny
all steps are-->main steps include

kuke

LGTM

Add doc for mandarin lm.

92ca520

pkuyym requested a review from kuke September 19, 2017 14:04

kuke reviewed Sep 20, 2017

View reviewed changes

Refine doc.

cefd7c7

kuke approved these changes Sep 21, 2017

View reviewed changes

pkuyym merged commit 88edc4c into PaddlePaddle:develop Sep 21, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add doc for mandarin lm. #298

Add doc for mandarin lm. #298

pkuyym commented Sep 19, 2017

kuke left a comment

kuke Sep 20, 2017

pkuyym Sep 21, 2017

kuke Sep 20, 2017

pkuyym Sep 21, 2017

kuke Sep 20, 2017

pkuyym Sep 21, 2017

kuke left a comment

Add doc for mandarin lm. #298

Add doc for mandarin lm. #298

Conversation

pkuyym commented Sep 19, 2017

kuke left a comment

Choose a reason for hiding this comment

kuke Sep 20, 2017

Choose a reason for hiding this comment

pkuyym Sep 21, 2017

Choose a reason for hiding this comment

kuke Sep 20, 2017

Choose a reason for hiding this comment

pkuyym Sep 21, 2017

Choose a reason for hiding this comment

kuke Sep 20, 2017

Choose a reason for hiding this comment

pkuyym Sep 21, 2017

Choose a reason for hiding this comment

kuke left a comment

Choose a reason for hiding this comment