You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Trainers accept an initial learning and a decay parameter. To control the learning rate dynamically without using the built-in decay, one can pass a scale parameter to the trainer.update(scale) function. Unfortunately, this directly affects gradient clipping:
Using trainer.update(scale) to manage the learning rate, clip_threshold is effectively divided by scale. Though there might be a reason for this, I was surprised by it. To get the same results as I would with TensorFlow, I have to compensante for this by calling trainer.set_clip_threshold(clip_threshold * scale) before calling update(scale).
The text was updated successfully, but these errors were encountered:
neubig
added
the
moderate bug
Issues that should be fixed but only affect less common environments or functionality
label
Jun 30, 2017
This was indeed unintuitive, so we revised the training interface, simplifying it and removing the "scaling" parameter in update, which was not intended to be used to scale the learning rate. Now you need to manage the learning rate in the way that it was intended, by setting the learning_rate member of the trainer class: #695
Trainers accept an initial learning and a decay parameter. To control the learning rate dynamically without using the built-in decay, one can pass a
scale
parameter to thetrainer.update(scale)
function. Unfortunately, this directly affects gradient clipping:From https://github.com/clab/dynet/blob/master/dynet/training.cc#L68:
Using
trainer.update(scale)
to manage the learning rate,clip_threshold
is effectively divided byscale
. Though there might be a reason for this, I was surprised by it. To get the same results as I would with TensorFlow, I have to compensante for this by callingtrainer.set_clip_threshold(clip_threshold * scale)
before callingupdate(scale)
.The text was updated successfully, but these errors were encountered: