-
Notifications
You must be signed in to change notification settings - Fork 704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify training interface by removing weight decay and scaling #695
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I am wondering if lamtram need to be modified with the issues pulled. |
When I am running lamtram, it may occurs Trainer::update_epoch has been deprecated and doesn't do anything. Please remove it from your code, and control the learning rate of the trainer directly, for example by: 'trainer.learning_rate /= (1 - rate_decay)', see #695 for details . |
It was deprecated in clab#695.
It was deprecated in clab#695.
* Remove deprecated Trainer::update_epoch It was deprecated in #695. * Remove first variable from examples
I can't seem to be able to run the |
Shouldn't it be I assume rate_decay to be some small positive value and dividing by (1- rate_decay) would make learning_rate grow larger. |
There is some confusion regarding the interface of the "Trainer" class:
#641
#684
I agree that it's difficult to understand. This commit removes the rate decay and gradient scaling functionality that implicitly changes the learning rate in non-transparent ways. Here is an example of the before/after behavior:
Rate Decay Before
Rate Decay After
Gradient Scaling Before:
Gradient Scaling After: