warm step #13

ShaniGam · 2019-12-05T12:11:58Z

Can you please explain the intuition for using warm_step=200 for only 1 epoch? It doesn't seem like enough for meaningful training without distillation. What happens if I use the distillation loss from scratch?

The text was updated successfully, but these errors were encountered:

twangnh · 2019-12-18T08:07:42Z

can you rephrase your question?

ShaniGam · 2019-12-18T08:19:24Z

The warm step is not mentioned in the paper. Does it improve the result?

twangnh · 2019-12-18T11:49:45Z

no, warm up is not related to distillation, it is used for stable training

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

warm step #13

warm step #13

ShaniGam commented Dec 5, 2019

twangnh commented Dec 18, 2019

ShaniGam commented Dec 18, 2019

twangnh commented Dec 18, 2019

warm step #13

warm step #13

Comments

ShaniGam commented Dec 5, 2019

twangnh commented Dec 18, 2019

ShaniGam commented Dec 18, 2019

twangnh commented Dec 18, 2019