You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Can you please explain the intuition for using warm_step=200 for only 1 epoch? It doesn't seem like enough for meaningful training without distillation. What happens if I use the distillation loss from scratch?
The text was updated successfully, but these errors were encountered:
Can you please explain the intuition for using
warm_step=200
for only 1 epoch? It doesn't seem like enough for meaningful training without distillation. What happens if I use the distillation loss from scratch?The text was updated successfully, but these errors were encountered: