Optimizer library #2190

dzhwinter · 2017-05-18T02:42:58Z

A optimizer library for parameter server side. will be extend to trainer side and parameter server side.

helinwang · 2017-05-18T22:39:06Z

paddle/lib/CMakeLists.txt

@@ -0,0 +1,15 @@
+# gangliao's built in cmake function


I think this line does not provide much information, shall we remove it?

change it to TODO，because it is not a cmake built-in function, which based on gangliao's PR.

helinwang · 2017-05-18T22:42:34Z

paddle/lib/CMakeLists.txt

@@ -0,0 +1,15 @@
+# gangliao's built in cmake function


This library is under paddle/lib. I think lib needs to be more expressive, if it' optimizer, we can use optimizer as the folder name.

Currently, we are using this library only for parameter server, so maybe we can put it under place like paddle/pserver/optimizer.

good point. 👍 Will we unify the optimizer of ParameterServer Side and Trainer side? if so, paddle/optimizer sounds better.

helinwang · 2017-05-18T22:44:04Z

paddle/lib/Tensor.h

@@ -0,0 +1,13 @@
+#ifndef PADDLE_FAKE_TENSOR_H_


We are following google C++ code style, so please follow https://google.github.io/styleguide/cppguide.html#The__define_Guard for header guard.

helinwang · 2017-05-18T22:45:17Z

paddle/lib/optimizer.cc

+#include "optimizer.h"
+#include "optimizer_private.h"
+
+int32_t paddle_create_SGDOptimizer(paddle_optimizer* optimizer, double learning_rate) {


For snake case, should this function be named as paddle_create_SGD_optimizer?

I thought that SGDOptimizer is a whole name of optimizer class. Are you mentioned it that we should name it in

class SGD_optimizer

helinwang · 2017-05-18T22:52:04Z

paddle/lib/optimizer.h

+
+ /*! \brief execute status code */ 
+ const int32_t LIB_SUCCESS = 0;
+ const int32_t LIB_WARNING = 1;


What should the client do when the return code is LIB_WARNING? From the best of my knowledge, library typicall return ok or error (https://github.com/yahoo/ygloo-ymagine/blob/f52dfd67ac2944f7853427527571da50ee1463dc/jni/include/ymagine/ymagine.h#L40). I have not see any library return warning.

I think maybe PADDLE_SUCCESS / PADDLE_OK is better than LIB_SUCCESS? "lib" does not bring much information.

agreed, these things should be put in a Constant place, such as GlobalConstant, otherwise, it will occur here and there.

helinwang · 2017-05-18T23:05:23Z

paddle/lib/optimizer_test.cc

+};
+
+void applyGradientDescent_TEST() {
+


Maybe only check in finished test :)

same as above

helinwang · 2017-05-18T23:06:22Z

paddle/lib/sgd_optimizer.h

+ }
+};
+
+class MomentumOptimizer : public ParameterOptimizer {


I think MomentumOptimizer should not be in file sgd_optimizer.h, it adds confusion.

move to new file

helinwang · 2017-05-18T23:08:29Z

paddle/lib/sgd_optimizer.h

+
+public:
+ /*! \brief call the applyXX for example */
+ void update(Tensor<T> &parameter,


This function has a different signature than the parent pure virtual function update, maybe we need to change the parent update function signature?

I don't think we should change the update function signature, just use it like this

fill the momentum vector with optimizerConfig when create_optimizer.
just use update(parameter, gradient, learning_rate) to call its real update function.

We cannot change the signature because they really have different arguments.
for example, RMSProp algorithm which takes more arguments, it will make the signature more confuse.
RMSProp see here : http://caffe.berkeleyvision.org/tutorial/solver.html

helinwang · 2017-05-18T23:10:22Z

paddle/lib/parameter_optimizer.h

+
+#include <string>
+#include <functional>
+// #include <math/Tensor.h>


No commented code please :)

helinwang · 2017-05-18T23:11:21Z

paddle/lib/parameter_optimizer.h

+ typedef std::function<void(Tensor&, const Tensor&)> UpdateHook;
+
+ static ParameterOptimizer *create();
+ virtual update(Tensor &parameter, const Tensor &gradient, double learning_rate) = 0;


The C API uses void* buffer to pass in parameter and gradient, can a Tensor be created from a raw buffer without a copy (not so familiar with C++ myself)?

check in more code of FakeTensor Implement

Yancey1989 · 2017-05-18T23:55:52Z

paddle/lib/optimizer.cc

+}
+
+int32_t paddle_release_optimizer(paddle_optimizer* optimizer) {
+ if(optimizer == nullptr)


Maybe more clearly as the following:

if (optimizer != nullptr) optimizer->impl->destotry(); return LIB_SUCCESS;

👍 got it!

helinwang · 2017-05-31T22:13:50Z

paddle/go/optimizer/CMakeLists.txt

@@ -0,0 +1,29 @@
+include_directories(${CMAKE_CURRENT_BINARY_DIR})


repo/go is mostly for go project and their bindings. Maybe optimizer is better located under repo/paddle/pserver?

that's a good point. In final form the optimizer will be used only in go pserver side, or will be used in trainer library side and go pserver? if so, should we put it into /repo/paddle/optimizer?

That's a good point. In the final form we should share optimizer when running locally and on pserver. Let's move to /repo/paddle/optimizer.

helinwang · 2017-05-31T22:14:56Z

paddle/go/optimizer/CMakeLists.txt

+# TODO:remove link options
+include_directories("/home/work/dongzhihong/github/Paddle/Paddle/third_party/install/glog/include")
+link_directories("/home/work/dongzhihong/github/Paddle/Paddle/third_party/install/glog/lib")
+# add_executable(optimizer_test optimizer_test.cpp)


Please no commented out code.

helinwang · 2017-05-31T22:16:41Z

paddle/go/optimizer/CMakeLists.txt

+add_dependencies(optimizer gen_proto_cpp)
+
+# TODO:remove link options
+include_directories("/home/work/dongzhihong/github/Paddle/Paddle/third_party/install/glog/include")


Can you ask @gangliao on how to include glog properly?

helinwang · 2017-05-31T22:22:45Z

paddle/go/optimizer/optimizer_factory.cc

+
+template<class T>
+void SGDOptimizer<T>::set_weight(const Tensor<T> *p) {
+// ParameterOptimizer::set_weight(p);


Please no commented out code.

fix it. Thanks your comment! I will double check it before send PR

helinwang · 2017-05-31T22:25:30Z

paddle/go/optimizer/Tensor.h

+#include <string.h>
+
+namespace paddle {
+template <class T>


since this is the tensor only used by optimizer for now, maybe add namespace pserver here. e.g., paddle::optimizer::Tensor

👍 fix it! I thought we will port it to majel tensor sooner or later. In fact, there may be a conflict between modules. Thanks for you mention it

helinwang · 2017-05-31T22:37:30Z

paddle/go/optimizer/optimizer_factory.h

+};
+
+template <class T>
+class AdagradOptimizer : public ParameterOptimizer<T> {


I can't find implementation for AdagradOptimizer, adam, adadelta. Maybe just remove these placeholders? It's confusing when something is not implemented but partially in the code base.

额只是作为optimizer library，实现了最常用的4种优化方法，可以覆盖90%的应用......

helinwang · 2017-05-31T22:46:14Z

paddle/go/optimizer/parameter_optimizer.cc

+template<class T>
+double ParameterOptimzier<T>::get_learning_rate() {
+ if (config_.lr_type() == paddle::OptimizerConfig_LearningRateType_Linear) {
+ learning_rate = ::std::max(learning_rate - lr_decay_a * num_sample_passed, lr_decay_b);


I think the plain simple SGD optimizer only have constant learning rate and does not contain a learning rate schedule. This learning rate schedule function should not be in the base class.

I think in v1 we don't need learning rate schedule. And in future, we maybe need to use composition instead of inheritance for learning rate schedule. Reference: https://stackoverflow.com/a/49016/852385

Think of containment as a has a relationship. A car "has an" engine, a person "has a" name, etc.
Think of inheritance as an is a relationship. A car "is a" vehicle, a person "is a" mammal, etc.

have been discussed. replace with constant learning rate value

helinwang · 2017-05-31T22:47:54Z

paddle/go/optimizer/parameter_optimizer.cc

+}
+
+template <class T>
+void ParameterOptimizer<T>::set_weight(const Tensor<T> *p) {


Do we need to set_weight other than during optimizer initialization? If not, can we put it into constructor?

reach a agreement after meeting

helinwang · 2017-05-31T22:49:27Z

paddle/go/optimizer/optimizer_factory.cc

+}
+
+template<class T>
+void SGDOptimizer<T>::destroy() {


I am not very familiar with C++, just curious why do we need this destroy function besides already having destructor? Is this a design pattern, can you give me a link?

@dzhwinter Can you take a look at this?

thanks pointed out that! You are right, we do not need a destroy explicitly.
just call the destructor with delete xxx is more elegant. I made a mistake here, I thought that In C interface we need an explicitly destroy function call to pass the GCC compile, in fact, I write the cmake script in wrong way. fix it.
Thanks for you reminding!

helinwang · 2017-05-31T22:50:10Z

paddle/go/optimizer/optimizer_factory.cc

+ num_sample_passed += 1;
+ learning_rate = get_learning_rate();
+ for(size_t i=0; i<parameter_.size(); ++i) {
+ momentums_[i] = momentum * momentums_[i] - learning_rate*gradient[i] - decay*parameter_[i];


I thought SGD does not use momentum?

the momentum, nesterov can be used as switch value. Other framework implements these three optimizers together. These framework has been popular in users. So I just follow their job.
e.g.
caffe
https://github.com/BVLC/caffe/blob/master/src/caffe/solvers/sgd_solver.cpp#L213
keras(has been the official frontend use tensorflow and theano)

http://caffe.berkeleyvision.org/tutorial/solver.html

helinwang · 2017-06-01T18:17:35Z

paddle/go/pserver/service.go

- s.opt = newOptimizer(sgd, 0.01)
+ // TODO(h
+ // elin): parse learning rate from config
+ s.opt = newOptimizer(config OptimizerConfig)


I have not see the definition of OptimizerConfig in Go side, does this code compiles? I think we need to check in code that compiles.
I can do the hook up in Go code for you if you want.

fix it, change to string

helinwang · 2017-06-01T18:18:42Z

paddle/go/pserver/service.go

@@ -26,7 +26,8 @@ const (
 type Parameter struct {
 Name string
 ElementType ElementType
- Content []byte
+ Size uint32
+ // Content []byte


No commented out code please.

helinwang · 2017-06-01T18:19:18Z

paddle/go/pserver/service.go

- opt *optimizer
- paramMap map[string]Parameter
+ mu sync.Mutex
+ paramMap map[string]Parameter


Since there is already optimizerMap, and optimizer owns parameter. So maybe we no longer need paramMap?

agreed.fix the go part Done.

helinwang · 2017-06-01T18:34:58Z

paddle/go/pserver/service.go

@@ -135,7 +138,10 @@ func (s *Service) SendGrads(grads []Gradient, dummy *int) error {
 errCh := make(chan error, count)
 for _, g := range grads {
 go func(p Parameter, g Gradient) {


This change introduces concurrent read to optimizerMap, map in Go is not thread safe. We can either use a mutex to protect it inside the function:

go func(name string, g Gradient) { s.mu.Lock() defer s.mu.Unlock() opt, err := s.optimizerMap[p.Name] if err != nil { err = opt.UpdateParameter(p, g) } }

The above function locks the mutex until optimization is finished, which is safe, since concurrent update to optimizer is a race condition. But the performance will hurt, since there is only a single mutex per Service.
There are two ways to fix it:

Introduce one mutex per parameter.

Do not protect against concurrent update to same optimizer.
Due to the stochastic nature of SGD, we can tolerate this race condition, I think 2 is better, because the code is simpler, and more clear (less bug):

go func(o *Optimizer, g Gradient) { err := o.UpdateParameter(g) errCh <- err }(s.optimizerMap[g.Name], g) // we are still protected by mutex when invoking s.optimizerMap[g.Name]

Thanks for your kindly reminding! I'm agreed. I thought that we don't need to protect against concurrent update.
I remember there is some theory bound on that topic,
such as Eric Xing's latency bounded sgd, some async sgd. published by google, I'am not quite sure it has effect in the learning performance.(maybe you are more familiar with that than me).

helinwang · 2017-06-01T18:36:21Z

paddle/go/pserver/service.go

@@ -135,7 +138,10 @@ func (s *Service) SendGrads(grads []Gradient, dummy *int) error {
 errCh := make(chan error, count)
 for _, g := range grads {
 go func(p Parameter, g Gradient) {
- err := s.opt.UpdateParameter(p, g)
+ opt, err := s.optimizerMap[p.Name]


I noticed the code here does not compile, because err is bool type, but you can not check bool type against nil (next line).

We should not check in code that does not compile. Maybe I can do the Go code part for now, and you can get more familiar with Go by reviewing the PR?

sorry, I haven't written the cmake script in right way. I will check in code more carefully. Thanks for your go lint editor plugin and your commit in go cmake script, now I can run this part. Thanks!

helinwang · 2017-06-01T18:47:43Z

paddle/optimizer/optimizer.cc

+
+int paddle_release_optimizer(paddle_optimizer* o) {
+ if (o != nullptr)
+ delete o->impl;


We are following Google C++ coding style.

Short conditional statements may be written on one line if this enhances readability. You may use this only when the line is brief and the statement does not use the else clause.
if (x == kFoo) return new Foo();
if (x == kBar) return new Bar();

See: https://google.github.io/styleguide/cppguide.html#Conditionals

fix the style. Thanks a lot for such a careful review! I would check in my code more carefully, and fix the style, such as go code compile problem by double check. Thanks !

helinwang · 2017-06-01T18:58:55Z

paddle/optimizer/parameter_optimizer.h

+ virtual T *get_weight() const;
+ virtual void set_weight(const Tensor<T> *parameter);
+ // package optimizer config proto in runtime for saving checkpoint
+ virtual char* get_config_proto();


I thought optimizer should not know about config proto. Instead, the factory method will doing all the parsing, and call the constructor of required optimizer.

same with last comment, I'm agree with you that optimizer should better leaving the parsing job to the create interface. But we need something to pack training state back.

helinwang · 2017-06-01T19:00:35Z

paddle/optimizer/parameter_optimizer.cc

+ CHECK(config_valid(config) == 0) << "error : invalid optimizer config ";
+ ParameterOptimizer<T> *opt = nullptr;
+ switch (config.optimizer_name()) {
+ case "SGD" : opt = new SGDOptimizer<T>(config); break;


I think this factory method should just parse all the config proto, and call constructor of required optimizer. The optimizer should not know anything about protobuffer.

Good suggestion, I'm agreed that parsing protobuf in optimizer is a bad coding style. However, I found that the optimizer needs an to pack training state and return it to the caller when we need a checkpoint. I thought configuration proto need to be sending back to the go server side. That's why I put the config proto into optimizer.

but create with required parameter seems more clear. fix it.

helinwang · 2017-06-01T19:02:53Z

paddle/optimizer/parameter_optimizer.cc

+}
+
+template<class T>
+double ParameterOptimzier<T>::get_learning_rate() {


As we discussed during yesterday's meeting, we plan to use constant learning rate for now. And in future we may use composition rather than inheritance. So we probably don't need to have this function get_learning_rate in the base class.

helinwang · 2017-06-01T19:04:30Z

paddle/optimizer/regularizer.h

+template<class T>
+class L2Regularizer {
+public:
+ void update(Tensor<T> &parameter);


Maybe we should not check in unimpleted class, to avoid confusion that we people see the code, there is L2Regularizer, and thought L2Regularizer is implemented, only later after debugging find out that L2Regularizer is not implemented.

I feel really sorry for so many mistakes. I will check in code more carefully!

helinwang · 2017-06-01T19:07:49Z

paddle/go/pserver/service.go

- err := s.opt.UpdateParameter(p, g)
+ opt, err := s.optimizerMap[p.Name]
+ if err != nil {
+ err := opt.UpdateParameter(p, g)


Do not use err:= here, otherwise we created a new err variable, shadowing the err outside.
See: http://blog.charmes.net/2015/06/scope-and-shadowing-in-go.html
You can use tools to check for shadow: https://github.com/alecthomas/gometalinter
The tool has a emacs package, install example: https://github.com/helinwang/go-emacs/blob/master/init.el#L20

Sadly I keep do my job in baidu's dev machine, last two days "jumbo(a package management system)" broken in whole baidu. It do not pass the go compile side.
Thanks for your lint package! I use spacemacs, what a good news to me!

helinwang · 2017-06-05T18:03:08Z

@dzhwinter submitted a new PR #2386 , it's the new version of this PR. So closing this PR.
Note to @dzhwinter : it's better to just do a force push to this PR's branch after resolving merge conflict. So that you don't need to open a new PR.
Keeping the old PR have the benefit that everyone can tract the review progress and the old comments.

* fix_quickstart_doc

dzhwinter force-pushed the optimizer_lib branch 2 times, most recently from 3515b2c to 0a03b9d Compare May 18, 2017 21:30

helinwang requested changes May 18, 2017

View reviewed changes

Yancey1989 reviewed May 19, 2017

View reviewed changes

dzhwinter added 5 commits May 26, 2017 10:46

format optimzier

26ad14f

"pack source in dev compile"

7d884c1

"refactor optimizer"

c0ca125

"update learning rate policy"

48e85f2

"template specialliaztion"

043a589

dzhwinter force-pushed the optimizer_lib branch from 92a5d16 to 043a589 Compare May 31, 2017 09:09

dzhwinter added 4 commits May 31, 2017 17:15

"enable testing"

bcbd264

wuyi test compile

4804376

"add test case of optimizer factory"

c9c2ee8

"add go protobuf support. need to fix download go protobuf plugin"

ee4a0d1

helinwang requested changes May 31, 2017

View reviewed changes

"move location of optimizer library"

de1c756

helinwang requested changes Jun 1, 2017

View reviewed changes

helinwang reviewed Jun 1, 2017

View reviewed changes

dzhwinter added 2 commits June 4, 2017 18:50

"add implement of adams update interface"

29c364f

"add gtest "

b8f33bf

dzhwinter mentioned this pull request Jun 5, 2017

merge with lasted develop branch. Optimizer lib2 #2386

Merged

helinwang closed this Jun 5, 2017

heavengate pushed a commit to heavengate/Paddle that referenced this pull request Aug 16, 2021

[dygraph] Fix quickstart doc (PaddlePaddle#2190)

9bf7d04

* fix_quickstart_doc

		@@ -0,0 +1,29 @@
		include_directories(${CMAKE_CURRENT_BINARY_DIR})

Optimizer library #2190

Optimizer library #2190

Conversation

dzhwinter commented May 18, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

helinwang May 18, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

helinwang May 18, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

helinwang Jun 1, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

helinwang Jun 1, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

helinwang commented Jun 5, 2017 • edited Loading

dzhwinter commented May 18, 2017 •

edited

Loading

helinwang May 18, 2017 •

edited

Loading

helinwang May 18, 2017 •

edited

Loading

helinwang Jun 1, 2017 •

edited

Loading

helinwang Jun 1, 2017 •

edited

Loading

helinwang commented Jun 5, 2017 •

edited

Loading