After about 25,700 steps, the loss value suddenly gets larger and larger.The loss is still very large until now,Is this normal? #7

tenghl · 2020-07-20T06:59:09Z

[2020-07-12T13:57:41] Logging to logdir/bert_run/bs=8,lr=7.4e-04,bert_lr=3.0e-06,end_lr=0e0,att=1

[2020-07-18T04:40:38] Step 25500 stats, train: loss = 0.04672800004482269
[2020-07-18T04:40:52] Step 25500 stats, val: loss = 5.44137978553772

[2020-07-18T05:30:57] Step 25600 stats, train: loss = 0.010738465120084584
[2020-07-18T05:31:12] Step 25600 stats, val: loss = 5.063877463340759

[2020-07-18T06:20:57] Step 25700 stats, train: loss = 0.05373691872227937
[2020-07-18T06:21:09] Step 25700 stats, val: loss = 5.3940101861953735

[2020-07-18T06:49:23] Step 25800 stats, train: loss = 7.563784122467041
[2020-07-18T06:49:31] Step 25800 stats, val: loss = 10.999245643615723

[2020-07-18T07:21:47] Step 25900 stats, train: loss = 12.75868844985962
[2020-07-18T07:22:03] Step 25900 stats, val: loss = 16.12211561203003

[2020-07-18T08:15:55] Step 26000 stats, train: loss = 8.206974983215332
[2020-07-18T08:16:11] Step 26000 stats, val: loss = 12.26950979232788

[2020-07-18T09:09:56] Step 26100 stats, train: loss = 78.75397872924805
[2020-07-18T09:10:12] Step 26100 stats, val: loss = 94.27817153930664
......
[2020-07-20T04:33:23] Step 32000 stats, train: loss = 100.70531845092773
[2020-07-20T04:33:39] Step 32000 stats, val: loss = 119.73884582519531

[2020-07-20T05:24:20] Step 32100 stats, train: loss = 97.69664764404297
[2020-07-20T05:24:34] Step 32100 stats, val: loss = 117.05315780639648

[2020-07-20T06:20:09] Step 32200 stats, train: loss = 104.20828628540039
[2020-07-20T06:20:25] Step 32200 stats, val: loss = 123.8116683959961

alexpolozov · 2020-07-22T06:35:26Z

Unfortunately, BERT-based model training can be sensitive to hypers and the random seed. I attached a log of a typical successful run.
log.txt

karthikj11 · 2020-08-19T06:21:54Z

@tenghanlin Even I'm facing the same issue at step 20460. Were you able to solve this issue by any means?

senthurRam33 · 2020-08-19T06:45:49Z

@alexpolozov Since random seed gets initialized by manual_seed, random seed values tends to be constant throughout. How do you think it affects the sensitivity of the model?

tenghl · 2020-08-22T12:00:20Z

@karthikj11 I didn't solve this problem, although I reduced the learning rate before the loss increased. I can't explain why the parameters are so sensitive.I got 67.9% accuracy on the Dev set at around 27400 step.

DevanshChoubey · 2020-09-12T19:40:54Z

@tenghanlin

Hi, could you share the 67.9 accuracy model or at least the log file for that model training and the change to the learning rate which you did ???

tenghl closed this as completed Jul 23, 2020

saparina mentioned this issue Jul 29, 2020

Loss starts to increase during BERT model training #11

Open

senthurRam33 mentioned this issue Aug 17, 2020

Bert-large model not attaining ~65% accuracy even after training till 52k timesteps! #10

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After about 25,700 steps, the loss value suddenly gets larger and larger.The loss is still very large until now,Is this normal? #7

After about 25,700 steps, the loss value suddenly gets larger and larger.The loss is still very large until now,Is this normal? #7

tenghl commented Jul 20, 2020

alexpolozov commented Jul 22, 2020

karthikj11 commented Aug 19, 2020

senthurRam33 commented Aug 19, 2020

tenghl commented Aug 22, 2020

DevanshChoubey commented Sep 12, 2020

After about 25,700 steps, the loss value suddenly gets larger and larger.The loss is still very large until now,Is this normal? #7

After about 25,700 steps, the loss value suddenly gets larger and larger.The loss is still very large until now,Is this normal? #7

Comments

tenghl commented Jul 20, 2020

alexpolozov commented Jul 22, 2020

karthikj11 commented Aug 19, 2020

senthurRam33 commented Aug 19, 2020

tenghl commented Aug 22, 2020

DevanshChoubey commented Sep 12, 2020