Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After about 25,700 steps, the loss value suddenly gets larger and larger.The loss is still very large until now,Is this normal? #7

Closed
tenghl opened this issue Jul 20, 2020 · 5 comments

Comments

@tenghl
Copy link

tenghl commented Jul 20, 2020

[2020-07-12T13:57:41] Logging to logdir/bert_run/bs=8,lr=7.4e-04,bert_lr=3.0e-06,end_lr=0e0,att=1

[2020-07-18T04:40:38] Step 25500 stats, train: loss = 0.04672800004482269
[2020-07-18T04:40:52] Step 25500 stats, val: loss = 5.44137978553772

[2020-07-18T05:30:57] Step 25600 stats, train: loss = 0.010738465120084584
[2020-07-18T05:31:12] Step 25600 stats, val: loss = 5.063877463340759

[2020-07-18T06:20:57] Step 25700 stats, train: loss = 0.05373691872227937
[2020-07-18T06:21:09] Step 25700 stats, val: loss = 5.3940101861953735

[2020-07-18T06:49:23] Step 25800 stats, train: loss = 7.563784122467041
[2020-07-18T06:49:31] Step 25800 stats, val: loss = 10.999245643615723

[2020-07-18T07:21:47] Step 25900 stats, train: loss = 12.75868844985962
[2020-07-18T07:22:03] Step 25900 stats, val: loss = 16.12211561203003

[2020-07-18T08:15:55] Step 26000 stats, train: loss = 8.206974983215332
[2020-07-18T08:16:11] Step 26000 stats, val: loss = 12.26950979232788

[2020-07-18T09:09:56] Step 26100 stats, train: loss = 78.75397872924805
[2020-07-18T09:10:12] Step 26100 stats, val: loss = 94.27817153930664
......
[2020-07-20T04:33:23] Step 32000 stats, train: loss = 100.70531845092773
[2020-07-20T04:33:39] Step 32000 stats, val: loss = 119.73884582519531

[2020-07-20T05:24:20] Step 32100 stats, train: loss = 97.69664764404297
[2020-07-20T05:24:34] Step 32100 stats, val: loss = 117.05315780639648

[2020-07-20T06:20:09] Step 32200 stats, train: loss = 104.20828628540039
[2020-07-20T06:20:25] Step 32200 stats, val: loss = 123.8116683959961

@alexpolozov
Copy link
Contributor

Unfortunately, BERT-based model training can be sensitive to hypers and the random seed. I attached a log of a typical successful run.
log.txt

@karthikj11
Copy link

@tenghanlin Even I'm facing the same issue at step 20460. Were you able to solve this issue by any means?

@senthurRam33
Copy link

@alexpolozov Since random seed gets initialized by manual_seed, random seed values tends to be constant throughout. How do you think it affects the sensitivity of the model?

@tenghl
Copy link
Author

tenghl commented Aug 22, 2020

@karthikj11 I didn't solve this problem, although I reduced the learning rate before the loss increased. I can't explain why the parameters are so sensitive.I got 67.9% accuracy on the Dev set at around 27400 step.

@DevanshChoubey
Copy link

@tenghanlin

Hi, could you share the 67.9 accuracy model or at least the log file for that model training and the change to the learning rate which you did ???

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants