Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizer library #2190

Closed
wants to merge 12 commits into from
1 change: 1 addition & 0 deletions paddle/go/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
include_directories(${CMAKE_CURRENT_BINARY_DIR})
add_subdirectory(optimizer)

go_library(adder SRCS adder.go)

Expand Down
6 changes: 4 additions & 2 deletions paddle/go/pserver/optimizer.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ package pserver
#include "optimizer.h"
*/
import "C"

import (
"fmt"
"unsafe"
Expand All @@ -21,9 +22,10 @@ type optimizer struct {
opt *C.struct_paddle_optimizer
}

func newOptimizer(t optimizerType, learning_rate float64) *optimizer {
func newOptimizer() *optimizer {
o := &optimizer{}
o.opt = C.paddle_create_SGD_optimizer(C.double(learning_rate))
OptimizerConfig config
o.opt = C.paddle_create_optimizer((*C.char)config, C.uint(config.size()))
return o
}

Expand Down
20 changes: 13 additions & 7 deletions paddle/go/pserver/service.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,8 @@ const (
type Parameter struct {
Name string
ElementType ElementType
Content []byte
Size uint32
// Content []byte
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No commented out code please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix done.

}

// ParameterWithConfig contains the parameter and the configuration.
Expand All @@ -42,15 +43,16 @@ type Gradient Parameter
type Service struct {
initialized chan struct{}

mu sync.Mutex
opt *optimizer
paramMap map[string]Parameter
mu sync.Mutex
paramMap map[string]Parameter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is already optimizerMap, and optimizer owns parameter. So maybe we no longer need paramMap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed.fix the go part Done.

optimizerMap map[string]*optimizer // per parameter to optmizer
}

// NewService creates a new service.
func NewService() *Service {
s := &Service{}
s.paramMap = make(map[string]Parameter)
s.optimizerMap = make(map[string]*optimizer)
s.initialized = make(chan struct{})
return s
}
Expand All @@ -71,8 +73,9 @@ func (s *Service) BeginInitParams(config []byte, dummy *int) error {
s.opt.Cleanup()
}

// TODO(helin): parse learning rate from config
s.opt = newOptimizer(sgd, 0.01)
// TODO(h
// elin): parse learning rate from config
s.opt = newOptimizer(config OptimizerConfig)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not see the definition of OptimizerConfig in Go side, does this code compiles? I think we need to check in code that compiles.
I can do the hook up in Go code for you if you want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix it, change to string

return nil
}

Expand Down Expand Up @@ -135,7 +138,10 @@ func (s *Service) SendGrads(grads []Gradient, dummy *int) error {
errCh := make(chan error, count)
for _, g := range grads {
go func(p Parameter, g Gradient) {
Copy link
Contributor

@helinwang helinwang Jun 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change introduces concurrent read to optimizerMap, map in Go is not thread safe. We can either use a mutex to protect it inside the function:

go func(name string, g Gradient) {
  s.mu.Lock()
  defer s.mu.Unlock()
  opt, err := s.optimizerMap[p.Name]
  if err != nil {
    err = opt.UpdateParameter(p, g)
  }
}

The above function locks the mutex until optimization is finished, which is safe, since concurrent update to optimizer is a race condition. But the performance will hurt, since there is only a single mutex per Service.
There are two ways to fix it:

  1. Introduce one mutex per parameter.
  2. Do not protect against concurrent update to same optimizer.
    Due to the stochastic nature of SGD, we can tolerate this race condition, I think 2 is better, because the code is simpler, and more clear (less bug):
go func(o *Optimizer, g Gradient) {
  err := o.UpdateParameter(g)
  errCh <- err
}(s.optimizerMap[g.Name], g) // we are still protected by mutex when invoking s.optimizerMap[g.Name]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your kindly reminding! I'm agreed. I thought that we don't need to protect against concurrent update.
I remember there is some theory bound on that topic,
such as Eric Xing's latency bounded sgd, some async sgd. published by google, I'am not quite sure it has effect in the learning performance.(maybe you are more familiar with that than me).

err := s.opt.UpdateParameter(p, g)
opt, err := s.optimizerMap[p.Name]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed the code here does not compile, because err is bool type, but you can not check bool type against nil (next line).

We should not check in code that does not compile. Maybe I can do the Go code part for now, and you can get more familiar with Go by reviewing the PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I haven't written the cmake script in right way. I will check in code more carefully. Thanks for your go lint editor plugin and your commit in go cmake script, now I can run this part. Thanks!

if err != nil {
err := opt.UpdateParameter(p, g)
Copy link
Contributor

@helinwang helinwang Jun 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use err:= here, otherwise we created a new err variable, shadowing the err outside.
See: http://blog.charmes.net/2015/06/scope-and-shadowing-in-go.html
You can use tools to check for shadow: https://github.com/alecthomas/gometalinter
The tool has a emacs package, install example: https://github.com/helinwang/go-emacs/blob/master/init.el#L20

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly I keep do my job in baidu's dev machine, last two days "jumbo(a package management system)" broken in whole baidu. It do not pass the go compile side.
Thanks for your lint package! I use spacemacs, what a good news to me!

}
errCh <- err
}(s.paramMap[g.Name], g)
}
Expand Down
22 changes: 22 additions & 0 deletions paddle/optimizer/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
include_directories(${CMAKE_CURRENT_BINARY_DIR})

set(OPITMIZER_SRCS
optimizer_factory.cc
parameter_optimizer.cc
regularizer.cc
)

set(OPITMIZER_Headers
optimizer.h
Tensor.h
optimizer_factory.h
parameter_optimizer.h
regularizer.h
)

add_library(optimizer STATIC ${OPITMIZER_SRCS})
add_dependencies(optimizer gen_proto_cpp)

add_simple_unittest(optimizer_test)
add_simple_unittest(optimizer_factory_test)

25 changes: 25 additions & 0 deletions paddle/optimizer/Tensor.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#ifndef PADDLE_FAKE_TENSOR_H_
#define PADDLE_FAKE_TENSOR_H_
/**
* @brief fake tensor for testing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "Tensor used by optimizer" is more appropriate. Since this tensor is actually working, not something fake and does not work. If you agree, please change header guard (PADDLE_FAKE_TENSOR_H_) as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix done.

*/

#include "paddle/math/BaseMatrix.h"
#include <string.h>

namespace paddle {
template <class T>
using TensorBase = BaseMatrixT<T>;

template <class T>
class Tensor : public TensorBase<T> {
public:
Tensor(T* data, int size) : TensorBase<T>(size, 1, 0, data, false, false) {}
T* get_buffer() { return this->data_; }
// TODO: replace with tensorshape
size_t height() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we agreed to change height into width?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for leave out some commits of this PR.
fix it.

return this->height_;
}
};

#endif
26 changes: 26 additions & 0 deletions paddle/optimizer/adadelta_optimizer.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#ifndef PADDLE_ADADELTA_OPTIMIZER_H_
#define PADDLE_ADADELTA_OPTIMIZER_H_

#include "parameter_optimizer.h"

namespace paddle {
namespace optimizer {

template <class T>
class AdadeltaOptimizer : public ParameterOptimizer<T> {
public:
/*! \brief call the applySGD for example */
void update(const Tensor<T> &gradient) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is not a working optimizer, maybe let's remove it for now to avoid confusion? (see: #2190 (comment))

}
private:
double learning_rate;
double rho;
double epsilon;
double decay;
};


}
}

#endif
4 changes: 4 additions & 0 deletions paddle/optimizer/adagrad_optimizer.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#include "sgd_optimizer.h"

namespace paddle {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a working source file, probably checked in by mistake. Maybe it's a good idea to review the PR first so that these mistakes can be avoided.

namespace optimizer {
24 changes: 24 additions & 0 deletions paddle/optimizer/adagrad_optimizer.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#ifndef PADDLE_ADAGRAD_OPTIMIZER_H_
#define PADDLE_ADAGRAD_OPTIMIZER_H_

#include "parameter_optimizer.h"

namespace paddle {
namespace optimizer {


template <class T>
class AdagradOptimizer : public ParameterOptimizer<T> {
public:
void update(const Tensor<T> &gradient) {
}
private:
double learning_rate;
double epsilon;
double decay;
};

}
}

#endif
26 changes: 26 additions & 0 deletions paddle/optimizer/adam_optimizer.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#ifndef PADDLE_ADAM_OPTIMIZER_H_
#define PADDLE_ADAM_OPTIMIZER_H_

#include "parameter_optimizer.h"

namespace paddle {
namespace optimizer {


template <class T>
class AdamOptimizer : public ParameterOptimizer<T> {
public:
/*! \brief call the applySGD for example */
void update(const Tensor<T> &gradient) {
}
private:
double learning_rate ;
double beta_1;
double beta_2;
double epsilon;
};


} // namespace optimizer
} // namespace paddle
#endif
75 changes: 75 additions & 0 deletions paddle/optimizer/optimizer.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
#include "optimizer.h"
#include <string>

#include "parameter_optimizer.h"

template<class T>
struct EnumToType {};

template<class T>
struct TypeToEnum {};

#define MATCH_ENUM_TYPE(TYPE, ENUM) \
template<> \
struct TypeToEnum<ENUM> { \
static paddle_element_type v() {return ENUM;}; \
static constexpr TYPE value = ENUM;
};
template<> \
struct EnumToType<ENUM> { \
typedef TYPE Type; \
} \

MATCH_ENUM_TYPE(int32_t, PADDLE_ELEMENT_TYPE_INT32);
MATCH_ENUM_TYPE(uint32_t, PADDLE_ELEMENT_TYPE_UINT32);
MATCH_ENUM_TYPE(int64_t, PADDLE_ELEMENT_TYPE_INT64);
MATCH_ENUM_TYPE(uint64_t, PADDLE_ELEMENT_TYPE_UINT64);
MATCH_ENUM_TYPE(float, PADDLE_ELEMENT_TYPE_FLOAT32);
MATCH_ENUM_TYPE(double, PADDLE_ELEMENT_TYPE_FLOAT64);

struct paddle_optimizer {
/*! \brief optmizer in C++ side */

paddle::optimizer::ParameterOptimzier* impl;
};

paddle_optimizer* paddle_create_optimizer(const unsigned char* config_proto,
int config_proto_len) {
paddle_optimizer* optimizer;
std::string config(config_proto, config_proto + config_proto_len);
optimizer->impl->create(config_proto);
return optimizer;
}

int paddle_release_optimizer(paddle_optimizer* o) {
if (o != nullptr)
delete o->impl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are following Google C++ coding style.

Short conditional statements may be written on one line if this enhances readability. You may use this only when the line is brief and the statement does not use the else clause.
if (x == kFoo) return new Foo();
if (x == kBar) return new Bar();

See: https://google.github.io/styleguide/cppguide.html#Conditionals

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix the style. Thanks a lot for such a careful review! I would check in my code more carefully, and fix the style, such as go code compile problem by double check. Thanks !

return PADDLE_SUCCESS;
}

int paddle_update_parameter(paddle_optimizer* o,
paddle_element_type data_type,
const void* grad_buffer,
int num_bytes) {
auto type = EnumToType<data_type>::Type;
paddle::Tensor<type> gradient(reinterpret_cast<type*>(grad_buffer),
num_bytes);
o->impl->update(gradient);
return PADDLE_SUCCESS;
}

int paddle_optimizer_set_weights(paddle_optimizer* o,
paddle_element_type data_type,
void*param_buffer,
int num_bytes) {
auto type = EnumToType<data_type>::Type;
paddle::Tensor<type>* param =
new paddle::Tensor<type>(reinterpret_cast<type*>(param_buffer), num_bytes);
o->impl->set_weight(param);
return PADDLE_SUCCESS;
}

void* paddle_optimizer_get_weights(paddle_optimizer* o) {
void* buffer = (void *)o->impl->get_weight();
return buffer;
}
93 changes: 93 additions & 0 deletions paddle/optimizer/optimizer.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
#ifndef PADDLE_LIB_OPTIMIZER_H_
#define PADDLE_LIB_OPTIMIZER_H_
#include <stdbool.h>
#include <stdint.h>

/*! \brief optimizer export C API. which will be used in
Case A, on Trainer (On ParameterServer Client) optimize gradient

Case B, on ParameterServer side optimize gradient

To simplify the configuration parsing. optimizer *do not* parse any config
e.g. learning rate should be calculated by the caller
*/

#ifdef __cplusplus
extern "C" {
#endif
/*! \brief datatypes */
typedef enum {
PADDLE_ELEMENT_TYPE_INT32 = 0,
PADDLE_ELEMENT_TYPE_UINT32 = 1,
PADDLE_ELEMENT_TYPE_INT64 = 2,
PADDLE_ELEMENT_TYPE_UINT64 = 3,
PADDLE_ELEMENT_TYPE_FLOAT32 = 4,
PADDLE_ELEMENT_TYPE_FLOAT64 = 5,
} paddle_element_type;

/*! \brief execute status code */
const int32_t PADDLE_SUCCESS = 0;
const int32_t PADDLE_ERROR = -1;

typedef struct paddle_optimizer paddle_optimizer;
/**
* this group interface called in order :
* 1. create optimizer with config
* 2. set weights
* 3. update_parameter
* 4. get_weights
* 5. release optimizer
*/


/**
* @brief create optimizer with proto_config
* @param config_proto, optimizer protobuf, see OptimizerConfig.proto in detail
* @return return optimizer instance
*/
paddle_optimizer* paddle_create_optimizer(const unsigned char* config_proto,
int config_proto_len);

/**
* @brief release optimizer
* @param optimizer
* @return return exec status
*/
int paddle_release_optimizer(paddle_optimizer* o);

/**
* @brief optimizer instance
* @param datatype of gradient and parameter
* @param gradient, calculate by optimzizer caller.
* TODO(zhihong): just pass loss to reduce communicate overhead.
* Project Adam Ms'14 paper for detail
* @param num_bytes, gradient size
* @return return exec status
*/
int paddle_update_parameter(paddle_optimizer* o,
paddle_element_type data_type,
const void* gradient,
int num_bytes);

/**
* @brief optimizer instance
* @param data_type datatype of gradient
* @param param_buffer, initilized parameter buffer
* @param num_bytes, parameter size
* @return return exec status
*/
int paddle_optimizer_set_weights(paddle_optimizer* o,
paddle_element_type data_type,
void* param_buffer,
int num_bytes);

/**
* @brief optimizer instance
* @return return content of parameter buffer in optimizer
*/
void* paddle_optimizer_get_weights(paddle_optimizer* o);

#ifdef __cplusplus
}
#endif
#endif
Loading