Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizer library #2190

Closed
wants to merge 12 commits into from
Closed

Conversation

dzhwinter
Copy link
Contributor

@dzhwinter dzhwinter commented May 18, 2017

A optimizer library for parameter server side. will be extend to trainer side and parameter server side.

@dzhwinter dzhwinter force-pushed the optimizer_lib branch 2 times, most recently from 3515b2c to 0a03b9d Compare May 18, 2017 21:30
@@ -0,0 +1,15 @@
# gangliao's built in cmake function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line does not provide much information, shall we remove it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change it to TODO,because it is not a cmake built-in function, which based on gangliao's PR.

@@ -0,0 +1,15 @@
# gangliao's built in cmake function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This library is under paddle/lib. I think lib needs to be more expressive, if it' optimizer, we can use optimizer as the folder name.

Currently, we are using this library only for parameter server, so maybe we can put it under place like paddle/pserver/optimizer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. 👍 Will we unify the optimizer of ParameterServer Side and Trainer side? if so, paddle/optimizer sounds better.

@@ -0,0 +1,13 @@
#ifndef PADDLE_FAKE_TENSOR_H_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are following google C++ code style, so please follow https://google.github.io/styleguide/cppguide.html#The__define_Guard for header guard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix it

#include "optimizer.h"
#include "optimizer_private.h"

int32_t paddle_create_SGDOptimizer(paddle_optimizer* optimizer, double learning_rate) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For snake case, should this function be named as paddle_create_SGD_optimizer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that SGDOptimizer is a whole name of optimizer class. Are you mentioned it that we should name it in

class SGD_optimizer


/*! \brief execute status code */
const int32_t LIB_SUCCESS = 0;
const int32_t LIB_WARNING = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should the client do when the return code is LIB_WARNING? From the best of my knowledge, library typicall return ok or error (https://github.com/yahoo/ygloo-ymagine/blob/f52dfd67ac2944f7853427527571da50ee1463dc/jni/include/ymagine/ymagine.h#L40). I have not see any library return warning.

I think maybe PADDLE_SUCCESS / PADDLE_OK is better than LIB_SUCCESS? "lib" does not bring much information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, these things should be put in a Constant place, such as GlobalConstant, otherwise, it will occur here and there.

};

void applyGradientDescent_TEST() {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe only check in finished test :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

}
};

class MomentumOptimizer : public ParameterOptimizer {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think MomentumOptimizer should not be in file sgd_optimizer.h, it adds confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to new file


public:
/*! \brief call the applyXX for example */
void update(Tensor<T> &parameter,
Copy link
Contributor

@helinwang helinwang May 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function has a different signature than the parent pure virtual function update, maybe we need to change the parent update function signature?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should change the update function signature, just use it like this

fill the momentum vector with optimizerConfig when create_optimizer.
just use update(parameter, gradient, learning_rate) to call its real update function.

We cannot change the signature because they really have different arguments.
for example, RMSProp algorithm which takes more arguments, it will make the signature more confuse.
RMSProp see here : http://caffe.berkeleyvision.org/tutorial/solver.html


#include <string>
#include <functional>
// #include <math/Tensor.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No commented code please :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix it

typedef std::function<void(Tensor&, const Tensor&)> UpdateHook;

static ParameterOptimizer *create();
virtual update(Tensor &parameter, const Tensor &gradient, double learning_rate) = 0;
Copy link
Contributor

@helinwang helinwang May 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The C API uses void* buffer to pass in parameter and gradient, can a Tensor be created from a raw buffer without a copy (not so familiar with C++ myself)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check in more code of FakeTensor Implement

}

int32_t paddle_release_optimizer(paddle_optimizer* optimizer) {
if(optimizer == nullptr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe more clearly as the following:

if (optimizer != nullptr)
    optimizer->impl->destotry();
return LIB_SUCCESS;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 got it!

@@ -0,0 +1,29 @@
include_directories(${CMAKE_CURRENT_BINARY_DIR})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repo/go is mostly for go project and their bindings. Maybe optimizer is better located under repo/paddle/pserver?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good point. In final form the optimizer will be used only in go pserver side, or will be used in trainer library side and go pserver? if so, should we put it into /repo/paddle/optimizer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. In the final form we should share optimizer when running locally and on pserver. Let's move to /repo/paddle/optimizer.

# TODO:remove link options
include_directories("/home/work/dongzhihong/github/Paddle/Paddle/third_party/install/glog/include")
link_directories("/home/work/dongzhihong/github/Paddle/Paddle/third_party/install/glog/lib")
# add_executable(optimizer_test optimizer_test.cpp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please no commented out code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix it

add_dependencies(optimizer gen_proto_cpp)

# TODO:remove link options
include_directories("/home/work/dongzhihong/github/Paddle/Paddle/third_party/install/glog/include")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you ask @gangliao on how to include glog properly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


template<class T>
void SGDOptimizer<T>::set_weight(const Tensor<T> *p) {
// ParameterOptimizer::set_weight(p);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please no commented out code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix it. Thanks your comment! I will double check it before send PR

#include <string.h>

namespace paddle {
template <class T>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is the tensor only used by optimizer for now, maybe add namespace pserver here. e.g., paddle::optimizer::Tensor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 fix it! I thought we will port it to majel tensor sooner or later. In fact, there may be a conflict between modules. Thanks for you mention it

};

template <class T>
class AdagradOptimizer : public ParameterOptimizer<T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find implementation for AdagradOptimizer, adam, adadelta. Maybe just remove these placeholders? It's confusing when something is not implemented but partially in the code base.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

额 只是作为optimizer library,实现了最常用的4种优化方法,可以覆盖90%的应用......

template<class T>
double ParameterOptimzier<T>::get_learning_rate() {
if (config_.lr_type() == paddle::OptimizerConfig_LearningRateType_Linear) {
learning_rate = ::std::max(learning_rate - lr_decay_a * num_sample_passed, lr_decay_b);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the plain simple SGD optimizer only have constant learning rate and does not contain a learning rate schedule. This learning rate schedule function should not be in the base class.

I think in v1 we don't need learning rate schedule. And in future, we maybe need to use composition instead of inheritance for learning rate schedule. Reference: https://stackoverflow.com/a/49016/852385

Think of containment as a has a relationship. A car "has an" engine, a person "has a" name, etc.
Think of inheritance as an is a relationship. A car "is a" vehicle, a person "is a" mammal, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have been discussed. replace with constant learning rate value

}

template <class T>
void ParameterOptimizer<T>::set_weight(const Tensor<T> *p) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to set_weight other than during optimizer initialization? If not, can we put it into constructor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reach a agreement after meeting

}

template<class T>
void SGDOptimizer<T>::destroy() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not very familiar with C++, just curious why do we need this destroy function besides already having destructor? Is this a design pattern, can you give me a link?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dzhwinter Can you take a look at this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks pointed out that! You are right, we do not need a destroy explicitly.
just call the destructor with delete xxx is more elegant. I made a mistake here, I thought that In C interface we need an explicitly destroy function call to pass the GCC compile, in fact, I write the cmake script in wrong way. fix it.
Thanks for you reminding!

num_sample_passed += 1;
learning_rate = get_learning_rate();
for(size_t i=0; i<parameter_.size(); ++i) {
momentums_[i] = momentum * momentums_[i] - learning_rate*gradient[i] - decay*parameter_[i];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought SGD does not use momentum?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the momentum, nesterov can be used as switch value. Other framework implements these three optimizers together. These framework has been popular in users. So I just follow their job.
e.g.
caffe
https://github.com/BVLC/caffe/blob/master/src/caffe/solvers/sgd_solver.cpp#L213
keras(has been the official frontend use tensorflow and theano)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s.opt = newOptimizer(sgd, 0.01)
// TODO(h
// elin): parse learning rate from config
s.opt = newOptimizer(config OptimizerConfig)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not see the definition of OptimizerConfig in Go side, does this code compiles? I think we need to check in code that compiles.
I can do the hook up in Go code for you if you want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix it, change to string

@@ -26,7 +26,8 @@ const (
type Parameter struct {
Name string
ElementType ElementType
Content []byte
Size uint32
// Content []byte
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No commented out code please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix done.

opt *optimizer
paramMap map[string]Parameter
mu sync.Mutex
paramMap map[string]Parameter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is already optimizerMap, and optimizer owns parameter. So maybe we no longer need paramMap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed.fix the go part Done.

@@ -135,7 +138,10 @@ func (s *Service) SendGrads(grads []Gradient, dummy *int) error {
errCh := make(chan error, count)
for _, g := range grads {
go func(p Parameter, g Gradient) {
Copy link
Contributor

@helinwang helinwang Jun 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change introduces concurrent read to optimizerMap, map in Go is not thread safe. We can either use a mutex to protect it inside the function:

go func(name string, g Gradient) {
  s.mu.Lock()
  defer s.mu.Unlock()
  opt, err := s.optimizerMap[p.Name]
  if err != nil {
    err = opt.UpdateParameter(p, g)
  }
}

The above function locks the mutex until optimization is finished, which is safe, since concurrent update to optimizer is a race condition. But the performance will hurt, since there is only a single mutex per Service.
There are two ways to fix it:

  1. Introduce one mutex per parameter.
  2. Do not protect against concurrent update to same optimizer.
    Due to the stochastic nature of SGD, we can tolerate this race condition, I think 2 is better, because the code is simpler, and more clear (less bug):
go func(o *Optimizer, g Gradient) {
  err := o.UpdateParameter(g)
  errCh <- err
}(s.optimizerMap[g.Name], g) // we are still protected by mutex when invoking s.optimizerMap[g.Name]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your kindly reminding! I'm agreed. I thought that we don't need to protect against concurrent update.
I remember there is some theory bound on that topic,
such as Eric Xing's latency bounded sgd, some async sgd. published by google, I'am not quite sure it has effect in the learning performance.(maybe you are more familiar with that than me).

@@ -135,7 +138,10 @@ func (s *Service) SendGrads(grads []Gradient, dummy *int) error {
errCh := make(chan error, count)
for _, g := range grads {
go func(p Parameter, g Gradient) {
err := s.opt.UpdateParameter(p, g)
opt, err := s.optimizerMap[p.Name]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed the code here does not compile, because err is bool type, but you can not check bool type against nil (next line).

We should not check in code that does not compile. Maybe I can do the Go code part for now, and you can get more familiar with Go by reviewing the PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I haven't written the cmake script in right way. I will check in code more carefully. Thanks for your go lint editor plugin and your commit in go cmake script, now I can run this part. Thanks!


int paddle_release_optimizer(paddle_optimizer* o) {
if (o != nullptr)
delete o->impl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are following Google C++ coding style.

Short conditional statements may be written on one line if this enhances readability. You may use this only when the line is brief and the statement does not use the else clause.
if (x == kFoo) return new Foo();
if (x == kBar) return new Bar();

See: https://google.github.io/styleguide/cppguide.html#Conditionals

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix the style. Thanks a lot for such a careful review! I would check in my code more carefully, and fix the style, such as go code compile problem by double check. Thanks !

virtual T *get_weight() const;
virtual void set_weight(const Tensor<T> *parameter);
// package optimizer config proto in runtime for saving checkpoint
virtual char* get_config_proto();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought optimizer should not know about config proto. Instead, the factory method will doing all the parsing, and call the constructor of required optimizer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same with last comment, I'm agree with you that optimizer should better leaving the parsing job to the create interface. But we need something to pack training state back.

CHECK(config_valid(config) == 0) << "error : invalid optimizer config ";
ParameterOptimizer<T> *opt = nullptr;
switch (config.optimizer_name()) {
case "SGD" : opt = new SGDOptimizer<T>(config); break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this factory method should just parse all the config proto, and call constructor of required optimizer. The optimizer should not know anything about protobuffer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, I'm agreed that parsing protobuf in optimizer is a bad coding style. However, I found that the optimizer needs an to pack training state and return it to the caller when we need a checkpoint. I thought configuration proto need to be sending back to the go server side. That's why I put the config proto into optimizer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but create with required parameter seems more clear. fix it.

}

template<class T>
double ParameterOptimzier<T>::get_learning_rate() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed during yesterday's meeting, we plan to use constant learning rate for now. And in future we may use composition rather than inheritance. So we probably don't need to have this function get_learning_rate in the base class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix it.

template<class T>
class L2Regularizer {
public:
void update(Tensor<T> &parameter);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should not check in unimpleted class, to avoid confusion that we people see the code, there is L2Regularizer, and thought L2Regularizer is implemented, only later after debugging find out that L2Regularizer is not implemented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel really sorry for so many mistakes. I will check in code more carefully!

err := s.opt.UpdateParameter(p, g)
opt, err := s.optimizerMap[p.Name]
if err != nil {
err := opt.UpdateParameter(p, g)
Copy link
Contributor

@helinwang helinwang Jun 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use err:= here, otherwise we created a new err variable, shadowing the err outside.
See: http://blog.charmes.net/2015/06/scope-and-shadowing-in-go.html
You can use tools to check for shadow: https://github.com/alecthomas/gometalinter
The tool has a emacs package, install example: https://github.com/helinwang/go-emacs/blob/master/init.el#L20

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly I keep do my job in baidu's dev machine, last two days "jumbo(a package management system)" broken in whole baidu. It do not pass the go compile side.
Thanks for your lint package! I use spacemacs, what a good news to me!

@helinwang
Copy link
Contributor

helinwang commented Jun 5, 2017

@dzhwinter submitted a new PR #2386 , it's the new version of this PR. So closing this PR.
Note to @dzhwinter : it's better to just do a force push to this PR's branch after resolving merge conflict. So that you don't need to open a new PR.
Keeping the old PR have the benefit that everyone can tract the review progress and the old comments.

@helinwang helinwang closed this Jun 5, 2017
heavengate pushed a commit to heavengate/Paddle that referenced this pull request Aug 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants