Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify training interface by removing weight decay and scaling #695

Merged
merged 5 commits into from
Jul 14, 2017

Conversation

neubig
Copy link
Contributor

@neubig neubig commented Jul 14, 2017

There is some confusion regarding the interface of the "Trainer" class:
#641
#684

I agree that it's difficult to understand. This commit removes the rate decay and gradient scaling functionality that implicitly changes the learning rate in non-transparent ways. Here is an example of the before/after behavior:

Rate Decay Before

// At beginning of training
Trainer trainer(initial_learning_rate, rate_decay)
// After every epoch
trainer.update_epoch()

Rate Decay After

// At beginning of training
Trainer trainer(initial_learning_rate)
// After every epoch
trainer.learning_rate /= (1 - rate_decay)

Gradient Scaling Before:

cg.backward(loss)
trainer.update(scaling_factor)

Gradient Scaling After:

cg.backward(loss * scaling_factor)
trainer.update()

Copy link
Collaborator

@pmichel31415 pmichel31415 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@liesun1994
Copy link

I am wondering if lamtram need to be modified with the issues pulled.

@liesun1994
Copy link

When I am running lamtram, it may occurs Trainer::update_epoch has been deprecated and doesn't do anything. Please remove it from your code, and control the learning rate of the trainer directly, for example by: 'trainer.learning_rate /= (1 - rate_decay)', see #695 for details .

@neubig neubig deleted the simplify-training branch September 1, 2017 17:55
tetsuok added a commit to tetsuok/dynet that referenced this pull request Sep 2, 2017
tetsuok added a commit to tetsuok/dynet that referenced this pull request Sep 3, 2017
neubig pushed a commit that referenced this pull request Sep 4, 2017
* Remove deprecated Trainer::update_epoch

It was deprecated in #695.

* Remove first variable from examples
@yuvalpinter
Copy link
Contributor

I can't seem to be able to run the trainer.learning_rate /= (1 - rate_decay) code. It gives me the following error: TypeError: 'property' object is not callable.
This happens for both MomentumSGDTrainer and AdamTrainer.

@shuheik
Copy link
Contributor

shuheik commented Jan 7, 2018

Shouldn't it be
trainer.learning_rate *= (1 - rate_decay) or
trainer.learning_rate /= (1 + rate_decay)
instead of
trainer.learning_rate /= (1 - rate_decay)?

I assume rate_decay to be some small positive value and dividing by (1- rate_decay) would make learning_rate grow larger.

yuvalpinter added a commit to yuvalpinter/dynet that referenced this pull request Jan 10, 2018
neubig pushed a commit that referenced this pull request Jan 17, 2018
kashif pushed a commit to kashif/dynet that referenced this pull request Jan 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants