-
Notifications
You must be signed in to change notification settings - Fork 538
[FEATURE] Implementation of Language model estimator #1155
base: v0.x
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1155 +/- ##
==========================================
+ Coverage 70.58% 78.57% +7.99%
==========================================
Files 72 77 +5
Lines 6970 7278 +308
==========================================
+ Hits 4920 5719 +799
+ Misses 2050 1559 -491
|
Job PR-1155/1 is complete. |
Job PR-1155/2 is complete. |
Job PR-1155/4 is complete. |
Job PR-1155/5 is complete. |
Job PR-1155/6 is complete. |
Job PR-1155/7 is complete. |
scripts/language_model/index.rst
Outdated
@@ -47,35 +47,35 @@ The dataset used for training the models is wikitext-2. | |||
|
|||
For all the above model settings, we set Tied = True and NTASGD = True . | |||
|
|||
[1] awd_lstm_lm_1150_wikitext-2 (Val PPL 68.71 Test PPL 65.62 ) | |||
[1] awd_lstm_lm_1150_wikitext-2 (Val PPL 68.52 Test PPL 65.68 ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you're at it - would you mind removing the rows related to hyper-parameters in the table? i.e. Mode, Num_layers, Embed size, Hidden size, Dropout, Dropout_h, Dropout_i, Dropout_e, Weight_drop. After the removal the table will be simpler. Also could you move the command to https://github.com/dmlc/web-data/tree/master/gluonnlp/logs/language_model and reference the links in the table? Currently, the commands take a lot of space. We shall simply the table for cache_lm and large word lm, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc is updated. I have submitted a new PR dmlc/web-data#232 to move the commands.
scripts/language_model/index.rst
Outdated
$ python large_word_language_model.py --gpus 0,1,2,3 --clip=10 | ||
$ python large_word_language_model.py --gpus 4 --eval-only --batch-size=1 | ||
$ python large_word_language_model_estimator.py --gpus 0,1,2,3 --clip=10 | ||
$ python large_word_language_model_estimator.py --gpus 4 --eval-only --batch-size=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No PPL change for large_word_language_model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still training on large_word_language_model. It takes approximately 6~7 days to finish the whole training. Currently I get a test ppl of 43.98 with the latest model checkpoint which is comparable to the baseline model. I will update it after the training is completed.
Job PR-1155/8 is complete. |
Job PR-1155/9 is complete. |
Job PR-1155/10 is complete. |
Description
Implementation of word language model estimator and large rnn language model estimator
Checklist
Essentials
Changes
Comments
cc @dmlc/gluon-nlp-team