-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adam solver #2918
Adam solver #2918
Conversation
c482eb0
to
c330b84
Compare
|
9d60495
to
cca6a5c
Compare
@shelhamer @jeffdonahue @philkr @PatWie Please take a look if you have time. I think this should be ready to merge. This is last piece of the solver trilogy in #2860. After merging this one, we can address #2890. |
@@ -218,6 +218,21 @@ class AdaDeltaSolver : public SGDSolver<Dtype> { | |||
}; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to cite the ADAM paper somewhere*. I suggest putting a reference here, e.g. in a doxygen formatted comment like this. Eventually it would also be good to add sections to the solver tutorial on these new solvers, where the reference should also then be added.
*We probably also need to go back and add references for some of the other recently merged solvers.
Thanks for the rebase @ronghanghu and thanks @PatWie for the original implementation! See above comment; otherwise looks good. |
Citation added for Adam.
Let's address that in #2890 . |
|
||
// we create aliases for convenience | ||
size_t update_history_offset = net_params.size(); | ||
shared_ptr<Blob<Dtype> > val_m = this->history_[param_id]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to create shared_ptr
s for these val_*
variables, is there? (I suggest using the raw pointer, e.g. Blob<Dtype>* val_m = this->history_[param_id].get();
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. I should use raw pointers.
Thanks for adding the citation. After a final glance I noticed the one other thing I commented above; sorry about not noticing before. Feel free to merge after addressing that. |
Looks good. |
17afec4
to
bf42e6e
Compare
This commit implements the Adam solver by Kingma et. al for CPU and GPU. All solver parameters are defined in the caffe.proto. This also adds an example for the MNIST dataset.
Changed from shared ptrs to raw ptrs in |
Carried on Adam solver (originally #2856) for merge.
I completed the tests and rebased it to latest master.
Authorship belongs to @PatWie, and is preserved in git commit.
Original message in #2856 :
As you may see, now both solver.cpp and test_gradient_based_solver.cpp are growing to 1000+ lines. This problem will be addressed in #2890.