-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After about 25,700 steps, the loss value suddenly gets larger and larger.The loss is still very large until now,Is this normal? #7
Comments
Unfortunately, BERT-based model training can be sensitive to hypers and the random seed. I attached a log of a typical successful run. |
@tenghanlin Even I'm facing the same issue at step 20460. Were you able to solve this issue by any means? |
@alexpolozov Since random seed gets initialized by manual_seed, random seed values tends to be constant throughout. How do you think it affects the sensitivity of the model? |
@karthikj11 I didn't solve this problem, although I reduced the learning rate before the loss increased. I can't explain why the parameters are so sensitive.I got 67.9% accuracy on the Dev set at around 27400 step. |
@tenghanlin Hi, could you share the 67.9 accuracy model or at least the log file for that model training and the change to the learning rate which you did ??? |
[2020-07-12T13:57:41] Logging to logdir/bert_run/bs=8,lr=7.4e-04,bert_lr=3.0e-06,end_lr=0e0,att=1
[2020-07-18T04:40:38] Step 25500 stats, train: loss = 0.04672800004482269
[2020-07-18T04:40:52] Step 25500 stats, val: loss = 5.44137978553772
[2020-07-18T05:30:57] Step 25600 stats, train: loss = 0.010738465120084584
[2020-07-18T05:31:12] Step 25600 stats, val: loss = 5.063877463340759
[2020-07-18T06:20:57] Step 25700 stats, train: loss = 0.05373691872227937
[2020-07-18T06:21:09] Step 25700 stats, val: loss = 5.3940101861953735
[2020-07-18T06:49:23] Step 25800 stats, train: loss = 7.563784122467041
[2020-07-18T06:49:31] Step 25800 stats, val: loss = 10.999245643615723
[2020-07-18T07:21:47] Step 25900 stats, train: loss = 12.75868844985962
[2020-07-18T07:22:03] Step 25900 stats, val: loss = 16.12211561203003
[2020-07-18T08:15:55] Step 26000 stats, train: loss = 8.206974983215332
[2020-07-18T08:16:11] Step 26000 stats, val: loss = 12.26950979232788
[2020-07-18T09:09:56] Step 26100 stats, train: loss = 78.75397872924805
[2020-07-18T09:10:12] Step 26100 stats, val: loss = 94.27817153930664
......
[2020-07-20T04:33:23] Step 32000 stats, train: loss = 100.70531845092773
[2020-07-20T04:33:39] Step 32000 stats, val: loss = 119.73884582519531
[2020-07-20T05:24:20] Step 32100 stats, train: loss = 97.69664764404297
[2020-07-20T05:24:34] Step 32100 stats, val: loss = 117.05315780639648
[2020-07-20T06:20:09] Step 32200 stats, train: loss = 104.20828628540039
[2020-07-20T06:20:25] Step 32200 stats, val: loss = 123.8116683959961
The text was updated successfully, but these errors were encountered: