Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizer library #2190

Closed
wants to merge 12 commits into from
1 change: 1 addition & 0 deletions paddle/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ add_subdirectory(gserver)
add_subdirectory(pserver)
add_subdirectory(trainer)
add_subdirectory(scripts)
add_subdirectory(optimizer)

if(CMAKE_Go_COMPILER)
add_subdirectory(go)
Expand Down
1 change: 1 addition & 0 deletions paddle/go/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
include_directories(${CMAKE_CURRENT_BINARY_DIR})
add_subdirectory(optimizer)

go_library(adder SRCS adder.go)

Expand Down
6 changes: 4 additions & 2 deletions paddle/go/pserver/optimizer.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ package pserver
#include "optimizer.h"
*/
import "C"

import (
"fmt"
"unsafe"
Expand All @@ -21,9 +22,10 @@ type optimizer struct {
opt *C.struct_paddle_optimizer
}

func newOptimizer(t optimizerType, learning_rate float64) *optimizer {
func newOptimizer() *optimizer {
o := &optimizer{}
o.opt = C.paddle_create_SGD_optimizer(C.double(learning_rate))
OptimizerConfig config
o.opt = C.paddle_create_optimizer((*C.char)config, C.uint(config.size()))
return o
}

Expand Down
20 changes: 13 additions & 7 deletions paddle/go/pserver/service.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,8 @@ const (
type Parameter struct {
Name string
ElementType ElementType
Content []byte
Size uint32
// Content []byte
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No commented out code please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix done.

}

// ParameterWithConfig contains the parameter and the configuration.
Expand All @@ -42,15 +43,16 @@ type Gradient Parameter
type Service struct {
initialized chan struct{}

mu sync.Mutex
opt *optimizer
paramMap map[string]Parameter
mu sync.Mutex
paramMap map[string]Parameter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is already optimizerMap, and optimizer owns parameter. So maybe we no longer need paramMap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed.fix the go part Done.

optimizerMap map[string]*optimizer // per parameter to optmizer
}

// NewService creates a new service.
func NewService() *Service {
s := &Service{}
s.paramMap = make(map[string]Parameter)
s.optimizerMap = make(map[string]*optimizer)
s.initialized = make(chan struct{})
return s
}
Expand All @@ -71,8 +73,9 @@ func (s *Service) BeginInitParams(config []byte, dummy *int) error {
s.opt.Cleanup()
}

// TODO(helin): parse learning rate from config
s.opt = newOptimizer(sgd, 0.01)
// TODO(h
// elin): parse learning rate from config
s.opt = newOptimizer(config OptimizerConfig)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not see the definition of OptimizerConfig in Go side, does this code compiles? I think we need to check in code that compiles.
I can do the hook up in Go code for you if you want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix it, change to string

return nil
}

Expand Down Expand Up @@ -135,7 +138,10 @@ func (s *Service) SendGrads(grads []Gradient, dummy *int) error {
errCh := make(chan error, count)
for _, g := range grads {
go func(p Parameter, g Gradient) {
Copy link
Contributor

@helinwang helinwang Jun 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change introduces concurrent read to optimizerMap, map in Go is not thread safe. We can either use a mutex to protect it inside the function:

go func(name string, g Gradient) {
  s.mu.Lock()
  defer s.mu.Unlock()
  opt, err := s.optimizerMap[p.Name]
  if err != nil {
    err = opt.UpdateParameter(p, g)
  }
}

The above function locks the mutex until optimization is finished, which is safe, since concurrent update to optimizer is a race condition. But the performance will hurt, since there is only a single mutex per Service.
There are two ways to fix it:

  1. Introduce one mutex per parameter.
  2. Do not protect against concurrent update to same optimizer.
    Due to the stochastic nature of SGD, we can tolerate this race condition, I think 2 is better, because the code is simpler, and more clear (less bug):
go func(o *Optimizer, g Gradient) {
  err := o.UpdateParameter(g)
  errCh <- err
}(s.optimizerMap[g.Name], g) // we are still protected by mutex when invoking s.optimizerMap[g.Name]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your kindly reminding! I'm agreed. I thought that we don't need to protect against concurrent update.
I remember there is some theory bound on that topic,
such as Eric Xing's latency bounded sgd, some async sgd. published by google, I'am not quite sure it has effect in the learning performance.(maybe you are more familiar with that than me).

err := s.opt.UpdateParameter(p, g)
opt, err := s.optimizerMap[p.Name]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed the code here does not compile, because err is bool type, but you can not check bool type against nil (next line).

We should not check in code that does not compile. Maybe I can do the Go code part for now, and you can get more familiar with Go by reviewing the PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I haven't written the cmake script in right way. I will check in code more carefully. Thanks for your go lint editor plugin and your commit in go cmake script, now I can run this part. Thanks!

if err != nil {
err := opt.UpdateParameter(p, g)
Copy link
Contributor

@helinwang helinwang Jun 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use err:= here, otherwise we created a new err variable, shadowing the err outside.
See: http://blog.charmes.net/2015/06/scope-and-shadowing-in-go.html
You can use tools to check for shadow: https://github.com/alecthomas/gometalinter
The tool has a emacs package, install example: https://github.com/helinwang/go-emacs/blob/master/init.el#L20

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly I keep do my job in baidu's dev machine, last two days "jumbo(a package management system)" broken in whole baidu. It do not pass the go compile side.
Thanks for your lint package! I use spacemacs, what a good news to me!

}
errCh <- err
}(s.paramMap[g.Name], g)
}
Expand Down
29 changes: 29 additions & 0 deletions paddle/optimizer/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
include_directories(${CMAKE_CURRENT_BINARY_DIR})

set(OPITMIZER_SRCS
adadelta_optimizer.cc
adagrad_optimizer.cc
adam_optimizer.cc
optimizer.cc
parameter_optimizer.cc
sgd_optmizer.cc
regularizer.cc
)

set(OPITMIZER_Headers
adadelta_optimizer.h
adagrad_optimizer.h
adam_optimizer.h
lr_policy.h
optimizer.h
parameter_optimizer.h
regularizer.h
sgd_optimizer.h
Tensor.h
)

add_library(optimizer STATIC ${OPITMIZER_SRCS})
add_dependencies(optimizer gen_proto_cpp)

add_simple_unittest(optimizer_test)
add_simple_unittest(optimizer_factory_test)
30 changes: 30 additions & 0 deletions paddle/optimizer/Tensor.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#ifndef PADDLE_OPTIMIZER_TENSOR_H_
#define PADDLE_OPTIMIZER_TENSOR_H_
/**
* @brief tensor used by optimizer
*/

#include "paddle/math/BaseMatrix.h"
#include <string.h>

namespace paddle {
namespace optimizer {

template <class T>
using TensorBase = BaseMatrixT<T>;

template <class T>
class Tensor : public TensorBase<T> {
public:
Tensor(T* data, int size) : TensorBase<T>(size, 1, 0, data, false, false) {}
T* get_buffer() { return this->data_; }
// TODO: replace with tensorshape
size_t width() {
return this->width_;
}
};

} // optimizer
} // paddle

#endif
44 changes: 44 additions & 0 deletions paddle/optimizer/adadelta_optimizer.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
#include "adadelta_optimizer.h"
#include <algorithm>

namespace paddle {
namespace optimizer {
template<class T>
AdadeltaOptimizer<T>::AdadeltaOptimizer(const ::paddle::OptimizerConfig &config) : ParameterOptimizer<T>(config) {
rho = config.adadelta().rho();
epsilon = config.adadelta().epsilon();
decay = config.adadelta().decay();
}

template<class T>
void AdadeltaOptimizer<T>::set_weight(const Tensor<T> *p) {
size_t size = p->width();
T* gptr = new T[size];
accum_gradient = Tensor<T>(gptr, size);
T* dptr = new T[size];
accum_delta = Tensor<T>(dtpr, size);
T* dptr_current = new T[size];
update_delta = Tensor<T>(dptr_current, size);
}

template<class T>
void AdadeltaOptimizer<T>::update(const Tensor<T> &gradient) {
num_sample_passed += 1;
double learning_rate = lr_policy->get_learning_rate();
for(size_t i=0; i<parameter_.size(); ++i) {
accum_gradient[i] = rho * accum_gradient[i] + (1.0 - rho) * gradient[i] * gradient[i];

update_delta[i] = std::sqrt(accum_delta[i] + epsilon) / std::sqrt(accum_gradient[i] + epsilon) * gradient[i];

accum_delta[i] = rho * accum_delta[i] + (1.0-rho) * update_delta[i] * update_delta[i];

parameter_[i] -= update_delta[i] + decay*parameter_[i];
}
}


template class AdadeltaOptimizer<float>;
template class AdadeltaOptimizer<double>;

}
}
35 changes: 35 additions & 0 deletions paddle/optimizer/adadelta_optimizer.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#ifndef PADDLE_ADADELTA_OPTIMIZER_H_
#define PADDLE_ADADELTA_OPTIMIZER_H_

#include "parameter_optimizer.h"

namespace paddle {
namespace optimizer {

template <class T>
class AdadeltaOptimizer : public ParameterOptimizer<T> {
public:
AdadeltaOptimizer(const OptimizerConfig &config);
~AdadeltaOptimizer(){
if(accum_gradient) delete accum_gradient;
if(accum_delta) delete accum_delta;
if(update_delta) delete update_delta;
}
void update(const Tensor<T> &gradient);
void set_weight(const Tensor<T> *p);
T* get_weight() const;

private:
Tensor<T> *accum_gradient;
Tensor<T> *accum_delta;
Tensor<T> *update_delta;

double rho;
double epsilon;
double decay;
};

}
}

#endif
36 changes: 36 additions & 0 deletions paddle/optimizer/adagrad_optimizer.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#include "adagrad_optimizer.h"

namespace paddle {
namespace optimizer {
template<class T>
AdagradOptimizer<T>::AdagradOptimizer(const ::paddle::OptimizerConfig &config) : ParameterOptimizer<T>(config) {
epsilon = config.adagrad().epsilon();
decay = config.adagrad().decay();
}

template<class T>
void AdagradOptimizer<T>::set_weight(const Tensor<T> *p) {
size_t size = p->width();
T* gptr = new T[size];
accum_gradient = Tensor<T>(gptr, size);
T* dptr = new T[size];
accum_delta = Tensor<T>(dtpr, size);
T* dptr_current = new T[size];
update_delta = Tensor<T>(dptr_current, size);
}

template<class T>
void AdagradOptimizer<T>::update(const Tensor<T> &gradient) {
num_sample_passed += 1;
double learning_rate = lr_policy->get_learning_rate();
for(size_t i=0; i<parameter_.size(); ++i) {
accum_gradient[i] += gradient[i] * gradient[i];
parameter_[i] += learning_rate * (gradient[i] / std::sqrt(accum_gradient[i] + epsilon) + decay * parameter_[i]);
}
}


template class AdagradOptimizer<float>;
template class AdagradOptimizer<double>;
}
}
30 changes: 30 additions & 0 deletions paddle/optimizer/adagrad_optimizer.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#ifndef PADDLE_ADAGRAD_OPTIMIZER_H_
#define PADDLE_ADAGRAD_OPTIMIZER_H_

#include "parameter_optimizer.h"

namespace paddle {
namespace optimizer {


template <class T>
class AdagradOptimizer : public ParameterOptimizer<T> {
public:
AdagradOptimizer(const OptimizerConfig &config);
~AdagradOptimizer(){
if(accum_gradient) delete accum_gradient;
}
void update(const Tensor<T> &gradient);
void set_weight(const Tensor<T> *p);
T* get_weight() const;

private:
Tensor<T> *accum_gradient;
double epsilon;
double decay;
};

}
}

#endif
37 changes: 37 additions & 0 deletions paddle/optimizer/adam_optimizer.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
#include "adam_optimizer.h"


namespace paddle {
namespace optimizer {
template<class T>
AdamOptimizer<T>::AdamOptimizer(const ::paddle::OptimizerConfig &config) : ParameterOptimizer<T>(config) {
beta_1 = config.adam().beta_1();
beta_2 = config.adam().beta_2();
epsilon = config.adam().epsilon();
decay = config.adam().decay();
}

template<class T>
void AdamOptimizer<T>::set_weight(const Tensor<T> *p) {
size_t size = p->width();
T* mptr = new T[size];
momentums_ = Tensor<T>(mptr, size);
T* vptr = new T[size];
velocitys_ = Tensor<T>(vtpr, size);
}

template<class T>
void AdamOptimizer<T>::update(const Tensor<T> &gradient) {
num_sample_passed += 1;
double learning_rate = lr_policy->get_learning_rate();
for(size_t i=0; i<parameter_.size(); ++i) {
accum_gradient[i] += gradient[i] * gradient[i];
parameter_[i] += learning_rate * (gradient[i] / std::sqrt(accum_gradient[i] + epsilon) + decay * parameter_[i]);
}
}


template class AdamOptimizer<float>;
template class AdamOptimizer<double>;
}
}
30 changes: 30 additions & 0 deletions paddle/optimizer/adam_optimizer.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#ifndef PADDLE_ADAM_OPTIMIZER_H_
#define PADDLE_ADAM_OPTIMIZER_H_

#include "parameter_optimizer.h"

namespace paddle {
namespace optimizer {


template <class T>
class AdamOptimizer : public ParameterOptimizer<T> {
public:
AdamOptimizer(const OptimizerConfig &config);
~AdamOptimizer(){}
void update(const Tensor<T> &gradient);
void set_weight(const Tensor<T> *p);
T* get_weight() const;
private:
Tensor<T> *momentums_;
Tensor<T> *velocitys_;
double beta_1;
double beta_2;
double epsilon;
double decay;
};


} // namespace optimizer
} // namespace paddle
#endif
31 changes: 31 additions & 0 deletions paddle/optimizer/lr_policy.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#ifndef PADDLE_OPTIMIZER_LR_POLICY_H_
#define PADDLE_OPTIMIZER_LR_POLICY_H_

#include "OptimizerConfig.ph.h"

namespace paddle {
namespace optimizer {

class BaseLr {
public:
LrPolicyBase(const OpitmizerConfig &config) {
learning_rate = config.lr_config().learning_rate();
}
virtual double get_learning_rate(const uint64_t num_sample_passed) = 0;
private:
double learning_rate;
};

// constant learning rate policy
class ConstLr final : public BaseLr {
public:
double get_learning_rate(const uint64_t num_sample_passed) {
return learning_rate;
}
};


}
}

#endif
Loading