Passing scale to Optimizer changes gradient clipping threshold #641

alexandres · 2017-06-29T15:45:31Z

Trainers accept an initial learning and a decay parameter. To control the learning rate dynamically without using the built-in decay, one can pass a scale parameter to the trainer.update(scale) function. Unfortunately, this directly affects gradient clipping:

From https://github.com/clab/dynet/blob/master/dynet/training.cc#L68:

if (scale * gg > clip_threshold) {                              
   ++clips;                                                      
   ++clips_since_status;                                         
   gscale = clip_threshold / (scale * gg);                       
}

Using trainer.update(scale) to manage the learning rate, clip_threshold is effectively divided by scale. Though there might be a reason for this, I was surprised by it. To get the same results as I would with TensorFlow, I have to compensante for this by calling trainer.set_clip_threshold(clip_threshold * scale) before calling update(scale).

The text was updated successfully, but these errors were encountered:

neubig · 2017-06-30T20:45:01Z

Thanks for pointing this out, we'll take a look.

neubig · 2017-07-14T19:04:52Z

This was indeed unintuitive, so we revised the training interface, simplifying it and removing the "scaling" parameter in update, which was not intended to be used to scale the learning rate. Now you need to manage the learning rate in the way that it was intended, by setting the learning_rate member of the trainer class: #695

neubig added the moderate bug Issues that should be fixed but only affect less common environments or functionality label Jun 30, 2017

neubig mentioned this issue Jul 14, 2017

Simplify training interface by removing weight decay and scaling #695

Merged

neubig closed this as completed Jul 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Passing scale to Optimizer changes gradient clipping threshold #641

Passing scale to Optimizer changes gradient clipping threshold #641

alexandres commented Jun 29, 2017

neubig commented Jun 30, 2017

neubig commented Jul 14, 2017

Passing scale to Optimizer changes gradient clipping threshold #641

Passing scale to Optimizer changes gradient clipping threshold #641

Comments

alexandres commented Jun 29, 2017

neubig commented Jun 30, 2017

neubig commented Jul 14, 2017