-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizer library #2190
Optimizer library #2190
Conversation
3515b2c
to
0a03b9d
Compare
paddle/lib/CMakeLists.txt
Outdated
@@ -0,0 +1,15 @@ | |||
# gangliao's built in cmake function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line does not provide much information, shall we remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change it to TODO,because it is not a cmake built-in function, which based on gangliao's PR.
paddle/lib/CMakeLists.txt
Outdated
@@ -0,0 +1,15 @@ | |||
# gangliao's built in cmake function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This library is under paddle/lib
. I think lib
needs to be more expressive, if it' optimizer, we can use optimizer
as the folder name.
Currently, we are using this library only for parameter server, so maybe we can put it under place like paddle/pserver/optimizer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. 👍 Will we unify the optimizer of ParameterServer Side and Trainer side? if so, paddle/optimizer
sounds better.
paddle/lib/Tensor.h
Outdated
@@ -0,0 +1,13 @@ | |||
#ifndef PADDLE_FAKE_TENSOR_H_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are following google C++ code style, so please follow https://google.github.io/styleguide/cppguide.html#The__define_Guard for header guard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix it
paddle/lib/optimizer.cc
Outdated
#include "optimizer.h" | ||
#include "optimizer_private.h" | ||
|
||
int32_t paddle_create_SGDOptimizer(paddle_optimizer* optimizer, double learning_rate) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For snake case, should this function be named as paddle_create_SGD_optimizer
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that SGDOptimizer
is a whole name of optimizer class. Are you mentioned it that we should name it in
class SGD_optimizer
paddle/lib/optimizer.h
Outdated
|
||
/*! \brief execute status code */ | ||
const int32_t LIB_SUCCESS = 0; | ||
const int32_t LIB_WARNING = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should the client do when the return code is LIB_WARNING? From the best of my knowledge, library typicall return ok or error (https://github.com/yahoo/ygloo-ymagine/blob/f52dfd67ac2944f7853427527571da50ee1463dc/jni/include/ymagine/ymagine.h#L40). I have not see any library return warning.
I think maybe PADDLE_SUCCESS / PADDLE_OK is better than LIB_SUCCESS? "lib" does not bring much information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, these things should be put in a Constant place, such as GlobalConstant, otherwise, it will occur here and there.
paddle/lib/optimizer_test.cc
Outdated
}; | ||
|
||
void applyGradientDescent_TEST() { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe only check in finished test :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
paddle/lib/sgd_optimizer.h
Outdated
} | ||
}; | ||
|
||
class MomentumOptimizer : public ParameterOptimizer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think MomentumOptimizer should not be in file sgd_optimizer.h, it adds confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to new file
paddle/lib/sgd_optimizer.h
Outdated
|
||
public: | ||
/*! \brief call the applyXX for example */ | ||
void update(Tensor<T> ¶meter, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function has a different signature than the parent pure virtual function update
, maybe we need to change the parent update
function signature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should change the update function signature, just use it like this
fill the momentum vector
with optimizerConfig
when create_optimizer.
just use update(parameter, gradient, learning_rate)
to call its real update function.
We cannot change the signature because they really have different arguments.
for example, RMSProp
algorithm which takes more arguments, it will make the signature more confuse.
RMSProp see here : http://caffe.berkeleyvision.org/tutorial/solver.html
paddle/lib/parameter_optimizer.h
Outdated
|
||
#include <string> | ||
#include <functional> | ||
// #include <math/Tensor.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No commented code please :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix it
paddle/lib/parameter_optimizer.h
Outdated
typedef std::function<void(Tensor&, const Tensor&)> UpdateHook; | ||
|
||
static ParameterOptimizer *create(); | ||
virtual update(Tensor ¶meter, const Tensor &gradient, double learning_rate) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The C API uses void*
buffer to pass in parameter and gradient, can a Tensor be created from a raw buffer without a copy (not so familiar with C++ myself)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check in more code of FakeTensor Implement
paddle/lib/optimizer.cc
Outdated
} | ||
|
||
int32_t paddle_release_optimizer(paddle_optimizer* optimizer) { | ||
if(optimizer == nullptr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe more clearly as the following:
if (optimizer != nullptr)
optimizer->impl->destotry();
return LIB_SUCCESS;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 got it!
paddle/go/optimizer/CMakeLists.txt
Outdated
@@ -0,0 +1,29 @@ | |||
include_directories(${CMAKE_CURRENT_BINARY_DIR}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
repo/go
is mostly for go project and their bindings. Maybe optimizer is better located under repo/paddle/pserver
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good point. In final form the optimizer will be used only in go pserver side, or will be used in trainer library side and go pserver? if so, should we put it into /repo/paddle/optimizer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. In the final form we should share optimizer when running locally and on pserver. Let's move to /repo/paddle/optimizer
.
paddle/go/optimizer/CMakeLists.txt
Outdated
# TODO:remove link options | ||
include_directories("/home/work/dongzhihong/github/Paddle/Paddle/third_party/install/glog/include") | ||
link_directories("/home/work/dongzhihong/github/Paddle/Paddle/third_party/install/glog/lib") | ||
# add_executable(optimizer_test optimizer_test.cpp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please no commented out code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix it
paddle/go/optimizer/CMakeLists.txt
Outdated
add_dependencies(optimizer gen_proto_cpp) | ||
|
||
# TODO:remove link options | ||
include_directories("/home/work/dongzhihong/github/Paddle/Paddle/third_party/install/glog/include") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you ask @gangliao on how to include glog properly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
|
||
template<class T> | ||
void SGDOptimizer<T>::set_weight(const Tensor<T> *p) { | ||
// ParameterOptimizer::set_weight(p); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please no commented out code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix it. Thanks your comment! I will double check it before send PR
paddle/go/optimizer/Tensor.h
Outdated
#include <string.h> | ||
|
||
namespace paddle { | ||
template <class T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is the tensor only used by optimizer for now, maybe add namespace pserver here. e.g., paddle::optimizer::Tensor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 fix it! I thought we will port it to majel tensor sooner or later. In fact, there may be a conflict between modules. Thanks for you mention it
}; | ||
|
||
template <class T> | ||
class AdagradOptimizer : public ParameterOptimizer<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find implementation for AdagradOptimizer, adam, adadelta. Maybe just remove these placeholders? It's confusing when something is not implemented but partially in the code base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
额 只是作为optimizer library,实现了最常用的4种优化方法,可以覆盖90%的应用......
template<class T> | ||
double ParameterOptimzier<T>::get_learning_rate() { | ||
if (config_.lr_type() == paddle::OptimizerConfig_LearningRateType_Linear) { | ||
learning_rate = ::std::max(learning_rate - lr_decay_a * num_sample_passed, lr_decay_b); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the plain simple SGD optimizer only have constant learning rate and does not contain a learning rate schedule. This learning rate schedule function should not be in the base class.
I think in v1 we don't need learning rate schedule. And in future, we maybe need to use composition instead of inheritance for learning rate schedule. Reference: https://stackoverflow.com/a/49016/852385
Think of containment as a has a relationship. A car "has an" engine, a person "has a" name, etc.
Think of inheritance as an is a relationship. A car "is a" vehicle, a person "is a" mammal, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have been discussed. replace with constant learning rate value
} | ||
|
||
template <class T> | ||
void ParameterOptimizer<T>::set_weight(const Tensor<T> *p) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to set_weight
other than during optimizer initialization? If not, can we put it into constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reach a agreement after meeting
} | ||
|
||
template<class T> | ||
void SGDOptimizer<T>::destroy() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not very familiar with C++, just curious why do we need this destroy
function besides already having destructor? Is this a design pattern, can you give me a link?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dzhwinter Can you take a look at this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks pointed out that! You are right, we do not need a destroy explicitly.
just call the destructor with delete xxx
is more elegant. I made a mistake here, I thought that In C interface we need an explicitly destroy function call to pass the GCC compile, in fact, I write the cmake script in wrong way. fix it.
Thanks for you reminding!
num_sample_passed += 1; | ||
learning_rate = get_learning_rate(); | ||
for(size_t i=0; i<parameter_.size(); ++i) { | ||
momentums_[i] = momentum * momentums_[i] - learning_rate*gradient[i] - decay*parameter_[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought SGD does not use momentum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the momentum
, nesterov
can be used as switch value. Other framework implements these three optimizers together. These framework has been popular in users. So I just follow their job.
e.g.
caffe
https://github.com/BVLC/caffe/blob/master/src/caffe/solvers/sgd_solver.cpp#L213
keras(has been the official frontend use tensorflow and theano)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s.opt = newOptimizer(sgd, 0.01) | ||
// TODO(h | ||
// elin): parse learning rate from config | ||
s.opt = newOptimizer(config OptimizerConfig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not see the definition of OptimizerConfig
in Go side, does this code compiles? I think we need to check in code that compiles.
I can do the hook up in Go code for you if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix it, change to string
@@ -26,7 +26,8 @@ const ( | |||
type Parameter struct { | |||
Name string | |||
ElementType ElementType | |||
Content []byte | |||
Size uint32 | |||
// Content []byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No commented out code please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix done.
opt *optimizer | ||
paramMap map[string]Parameter | ||
mu sync.Mutex | ||
paramMap map[string]Parameter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there is already optimizerMap, and optimizer owns parameter. So maybe we no longer need paramMap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed.fix the go part Done.
@@ -135,7 +138,10 @@ func (s *Service) SendGrads(grads []Gradient, dummy *int) error { | |||
errCh := make(chan error, count) | |||
for _, g := range grads { | |||
go func(p Parameter, g Gradient) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change introduces concurrent read to optimizerMap, map
in Go is not thread safe. We can either use a mutex to protect it inside the function:
go func(name string, g Gradient) {
s.mu.Lock()
defer s.mu.Unlock()
opt, err := s.optimizerMap[p.Name]
if err != nil {
err = opt.UpdateParameter(p, g)
}
}
The above function locks the mutex until optimization is finished, which is safe, since concurrent update to optimizer is a race condition. But the performance will hurt, since there is only a single mutex per Service.
There are two ways to fix it:
- Introduce one mutex per parameter.
- Do not protect against concurrent update to same optimizer.
Due to the stochastic nature of SGD, we can tolerate this race condition, I think 2 is better, because the code is simpler, and more clear (less bug):
go func(o *Optimizer, g Gradient) {
err := o.UpdateParameter(g)
errCh <- err
}(s.optimizerMap[g.Name], g) // we are still protected by mutex when invoking s.optimizerMap[g.Name]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your kindly reminding! I'm agreed. I thought that we don't need to protect against concurrent update.
I remember there is some theory bound on that topic,
such as Eric Xing's latency bounded sgd, some async sgd. published by google, I'am not quite sure it has effect in the learning performance.(maybe you are more familiar with that than me).
@@ -135,7 +138,10 @@ func (s *Service) SendGrads(grads []Gradient, dummy *int) error { | |||
errCh := make(chan error, count) | |||
for _, g := range grads { | |||
go func(p Parameter, g Gradient) { | |||
err := s.opt.UpdateParameter(p, g) | |||
opt, err := s.optimizerMap[p.Name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed the code here does not compile, because err
is bool
type, but you can not check bool
type against nil
(next line).
We should not check in code that does not compile. Maybe I can do the Go code part for now, and you can get more familiar with Go by reviewing the PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I haven't written the cmake script in right way. I will check in code more carefully. Thanks for your go lint editor plugin and your commit in go cmake script, now I can run this part. Thanks!
|
||
int paddle_release_optimizer(paddle_optimizer* o) { | ||
if (o != nullptr) | ||
delete o->impl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are following Google C++ coding style.
Short conditional statements may be written on one line if this enhances readability. You may use this only when the line is brief and the statement does not use the else clause.
if (x == kFoo) return new Foo();
if (x == kBar) return new Bar();
See: https://google.github.io/styleguide/cppguide.html#Conditionals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix the style. Thanks a lot for such a careful review! I would check in my code more carefully, and fix the style, such as go code compile problem by double check. Thanks !
virtual T *get_weight() const; | ||
virtual void set_weight(const Tensor<T> *parameter); | ||
// package optimizer config proto in runtime for saving checkpoint | ||
virtual char* get_config_proto(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought optimizer should not know about config proto. Instead, the factory method will doing all the parsing, and call the constructor of required optimizer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same with last comment, I'm agree with you that optimizer should better leaving the parsing job to the create interface. But we need something to pack training state back.
CHECK(config_valid(config) == 0) << "error : invalid optimizer config "; | ||
ParameterOptimizer<T> *opt = nullptr; | ||
switch (config.optimizer_name()) { | ||
case "SGD" : opt = new SGDOptimizer<T>(config); break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this factory method should just parse all the config proto, and call constructor of required optimizer. The optimizer should not know anything about protobuffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion, I'm agreed that parsing protobuf in optimizer is a bad coding style. However, I found that the optimizer needs an to pack training state and return it to the caller when we need a checkpoint. I thought configuration proto need to be sending back to the go server side. That's why I put the config proto into optimizer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but create with required parameter seems more clear. fix it.
} | ||
|
||
template<class T> | ||
double ParameterOptimzier<T>::get_learning_rate() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed during yesterday's meeting, we plan to use constant learning rate for now. And in future we may use composition rather than inheritance. So we probably don't need to have this function get_learning_rate
in the base class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix it.
template<class T> | ||
class L2Regularizer { | ||
public: | ||
void update(Tensor<T> ¶meter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should not check in unimpleted class, to avoid confusion that we people see the code, there is L2Regularizer
, and thought L2Regularizer
is implemented, only later after debugging find out that L2Regularizer
is not implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel really sorry for so many mistakes. I will check in code more carefully!
err := s.opt.UpdateParameter(p, g) | ||
opt, err := s.optimizerMap[p.Name] | ||
if err != nil { | ||
err := opt.UpdateParameter(p, g) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not use err:=
here, otherwise we created a new err variable, shadowing the err outside.
See: http://blog.charmes.net/2015/06/scope-and-shadowing-in-go.html
You can use tools to check for shadow: https://github.com/alecthomas/gometalinter
The tool has a emacs package, install example: https://github.com/helinwang/go-emacs/blob/master/init.el#L20
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly I keep do my job in baidu's dev machine, last two days "jumbo(a package management system)" broken in whole baidu. It do not pass the go compile side.
Thanks for your lint package! I use spacemacs, what a good news to me!
@dzhwinter submitted a new PR #2386 , it's the new version of this PR. So closing this PR. |
* fix_quickstart_doc
A optimizer library for parameter server side. will be extend to trainer side and parameter server side.