Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Increased Loss/Validation Perplexity on word language model #14722

Closed
nswamy opened this issue Apr 17, 2019 · 1 comment
Closed

Increased Loss/Validation Perplexity on word language model #14722

nswamy opened this issue Apr 17, 2019 · 1 comment
Assignees

Comments

@nswamy
Copy link
Member

nswamy commented Apr 17, 2019

Description

Validation perplexity on a word language model went up from 127 to 186 on 02/21. I tested with the cu90 pypi package from 1.5.0b20190221.
Looking into the commits that went in, this led me to the change that happened on softmax accumulation type in #14098.
Tested by reverting the commit 862cbc6 from 1.5.0b20190221 and built a static library using the static library build script, this as expected came down to 127.
There was another change on 03/12 in #14219 that changed the default data type to fp32 which helped reduce the validation perplexity to 140. This was an unintended consequence and not the right fix.

Environment info (Required)

Deep Learning Ubuntu Base AMI id: ami-0ff00f007c727c376
Instance Type: P2.16X

What to do:

  1. Create a P2.16X EC2 instance or similar with the ami id(latest when i tested) above.

  2. sudo pip install mxnet-cu90==1.5.0b20190220 &

  3. git clone https://github.com/awslabs/deeplearning-benchmark

  4. Use the script https://github.com/awslabs/deeplearning-benchmark/blob/master/word_language_model/word_language_model_train.py

  5. Run the script using `python word_language_model/word_language_model.py --gpus 8 --nhid 650 --emsize 650 --dropout 0.5 --epochs 40 --data word_language_model/data/ptb. --mode imperative --kvstore device

  6. Run the same with sudo pip install mxnet-cu90==1.5.0b20190221.

Output:

  • Version: 1.5.0b20190220
hostname ; pip show mxnet-cu90 ; tail -n 10 lstm_ptb_imperative.log 
ip-172-31-29-103 
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7. 
Name: mxnet-cu90 
Version: 1.5.0b20190220 
INFO:root:[Epoch 38] time cost 26.20s, valid loss 4.84, valid ppl 126.83 
INFO:root:test loss 4.79, test ppl 120.65 
INFO:root:[Epoch 39] time cost 26.03s, valid loss 4.84, valid ppl 126.66 
INFO:root:test loss 4.79, test ppl 120.26 
INFO:root:Best test loss 4.79, test ppl 120.26
  • Version: 1.5.0b20190221
hostname ; pip show mxnet-cu90 ; tail -n 10 lstm_ptb_imperative.log 
ip-172-31-63-76 
Name: mxnet-cu90 
Version: 1.5.0b20190221 
Summary: MXNet is an ultra-scalable deep learning framework. This version uses CUDA-9.0. 
Home-page: https://github.com/apache/incubator-mxnet 
Author: UNKNOWN 
Author-email: UNKNOWN 
License: Apache 2.0 
Location: /usr/local/lib/python2.7/dist-packages 
Requires: numpy, requests, graphviz 
You are using pip version 9.0.1, however version 19.0.3 is available. 
You should consider upgrading via the 'pip install --upgrade pip' command. 
INFO:root:test loss 5.23, test ppl 186.97 
INFO:root:[Epoch 36] time cost 25.76s, valid loss 5.27, valid ppl 194.12 
INFO:root:test loss 5.23, test ppl 186.37 
INFO:root:[Epoch 37] time cost 25.64s, valid loss 5.24, valid ppl 189.45 
INFO:root:test loss 5.20, test ppl 181.52 
INFO:root:[Epoch 38] time cost 26.20s, valid loss 5.24, valid ppl 189.03 
INFO:root:test loss 5.20, test ppl 180.95 
INFO:root:[Epoch 39] time cost 25.59s, valid loss 5.23, valid ppl 185.90 
INFO:root:test loss 5.18, test ppl 177.53 
INFO:root:Best test loss 5.18, test ppl 177.53
  • Version: 1.5.0b20190313 -> This gives a perplexity of 140.
hostname ; pip show mxnet-cu90 ; tail -n 10 lstm_ptb_imperative.log ; echo "commit-hash" ; cat /usr/local/lib/python2.7/dist-packages/mxnet/COMMIT_HASH 
ip-172-31-29-103 
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7. 
Name: mxnet-cu90 
Version: 1.5.0b20190313 
Summary: MXNet is an ultra-scalable deep learning framework. This version uses CUDA-9.0. 
Home-page: https://github.com/apache/incubator-mxnet 
Author: UNKNOWN 
Author-email: UNKNOWN 
License: Apache 2.0 
Location: /usr/local/lib/python2.7/dist-packages 
Requires: numpy, requests, graphviz 
Required-by: 
INFO:root:test loss 4.92, test ppl 136.69 
INFO:root:[Epoch 36] time cost 25.91s, valid loss 4.96, valid ppl 142.06 
INFO:root:test loss 4.91, test ppl 135.81 
INFO:root:[Epoch 37] time cost 25.51s, valid loss 4.95, valid ppl 141.45 
INFO:root:test loss 4.91, test ppl 135.16 
INFO:root:[Epoch 38] time cost 25.57s, valid loss 4.95, valid ppl 141.29 
INFO:root:test loss 4.91, test ppl 134.99 
INFO:root:[Epoch 39] time cost 26.08s, valid loss 4.95, valid ppl 140.87 
INFO:root:test loss 4.90, test ppl 134.48 
INFO:root:Best test loss 4.90, test ppl 134.48 
commit-hash 
4432af1f47a439517eff9a21bef23ef7ae5e4aa4

What have you tried:

  1. Reverted the softmax commit 862cbc6 on top of 0221 and reran the test to see validation perplexity drop to 127.

  2. Have a PR in progress and tested to see that the validation perplexity drops to 127
    https://github.com/apache/incubator-mxnet/compare/master...nswamy:softmax_fix?expand=1

@nswamy nswamy self-assigned this Apr 17, 2019
@nswamy nswamy changed the title Increased Validation Perplexity on word language model Increased Loss/Validation Perplexity on word language model Apr 22, 2019
@eric-haibin-lin
Copy link
Member

it should be fixed in #15037 which makes the behavior the same as before

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants