Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

[FEATURE] Implementation of Language model estimator #1155

Open
wants to merge 32 commits into
base: v0.x
Choose a base branch
from

Conversation

liuzh47
Copy link
Contributor

@liuzh47 liuzh47 commented Feb 13, 2020

Description

Implementation of word language model estimator and large rnn language model estimator

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

cc @dmlc/gluon-nlp-team

@liuzh47 liuzh47 requested a review from a team as a code owner February 13, 2020 12:07
@codecov
Copy link

codecov bot commented Feb 13, 2020

Codecov Report

Merging #1155 into master will increase coverage by 7.99%.
The diff coverage is 25.64%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1155      +/-   ##
==========================================
+ Coverage   70.58%   78.57%   +7.99%     
==========================================
  Files          72       77       +5     
  Lines        6970     7278     +308     
==========================================
+ Hits         4920     5719     +799     
+ Misses       2050     1559     -491
Impacted Files Coverage Δ
src/gluonnlp/estimator/__init__.py 100% <100%> (ø)
...uonnlp/estimator/language_model_batch_processor.py 17.89% <17.89%> (ø)
...gluonnlp/estimator/language_model_event_handler.py 24.09% <24.09%> (ø)
src/gluonnlp/loss/joint_loss.py 34.78% <34.78%> (ø)
src/gluonnlp/estimator/language_model_estimator.py 41.17% <41.17%> (ø)
src/gluonnlp/model/train/cache.py 25.58% <0%> (-72.1%) ⬇️
src/gluonnlp/data/batchify/language_model.py 43.92% <0%> (-52.34%) ⬇️
src/gluonnlp/model/translation.py 20.31% <0%> (-51.57%) ⬇️
src/gluonnlp/embedding/evaluation.py 40.33% <0%> (-51.27%) ⬇️
src/gluonnlp/model/language_model.py 48.87% <0%> (-49.63%) ⬇️
... and 48 more

@mli
Copy link
Member

mli commented Feb 13, 2020

Job PR-1155/1 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1155/1/index.html

@mli
Copy link
Member

mli commented Feb 13, 2020

Job PR-1155/2 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1155/2/index.html

@mli
Copy link
Member

mli commented Feb 14, 2020

Job PR-1155/4 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1155/4/index.html

@mli
Copy link
Member

mli commented Feb 14, 2020

Job PR-1155/5 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1155/5/index.html

@mli
Copy link
Member

mli commented Feb 14, 2020

Job PR-1155/6 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1155/6/index.html

@mli
Copy link
Member

mli commented Feb 14, 2020

Job PR-1155/7 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1155/7/index.html

@@ -47,35 +47,35 @@ The dataset used for training the models is wikitext-2.

For all the above model settings, we set Tied = True and NTASGD = True .

[1] awd_lstm_lm_1150_wikitext-2 (Val PPL 68.71 Test PPL 65.62 )
[1] awd_lstm_lm_1150_wikitext-2 (Val PPL 68.52 Test PPL 65.68 )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you're at it - would you mind removing the rows related to hyper-parameters in the table? i.e. Mode, Num_layers, Embed size, Hidden size, Dropout, Dropout_h, Dropout_i, Dropout_e, Weight_drop. After the removal the table will be simpler. Also could you move the command to https://github.com/dmlc/web-data/tree/master/gluonnlp/logs/language_model and reference the links in the table? Currently, the commands take a lot of space. We shall simply the table for cache_lm and large word lm, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc is updated. I have submitted a new PR dmlc/web-data#232 to move the commands.

$ python large_word_language_model.py --gpus 0,1,2,3 --clip=10
$ python large_word_language_model.py --gpus 4 --eval-only --batch-size=1
$ python large_word_language_model_estimator.py --gpus 0,1,2,3 --clip=10
$ python large_word_language_model_estimator.py --gpus 4 --eval-only --batch-size=1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No PPL change for large_word_language_model?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still training on large_word_language_model. It takes approximately 6~7 days to finish the whole training. Currently I get a test ppl of 43.98 with the latest model checkpoint which is comparable to the baseline model. I will update it after the training is completed.

@mli
Copy link
Member

mli commented Feb 17, 2020

Job PR-1155/8 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1155/8/index.html

@mli
Copy link
Member

mli commented Feb 17, 2020

Job PR-1155/9 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1155/9/index.html

@mli
Copy link
Member

mli commented Feb 17, 2020

Job PR-1155/10 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1155/10/index.html

@szha szha changed the base branch from master to v0.x August 13, 2020 02:17
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants