-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design Doc for Regularization #4869
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I definitely agree that we need the flexibility to add different forms and/or strengths of regularization to different variables.
The below comments may be more appropriate for the Python API discussion, but here goes:
In general, I think it would be good if each Parameter
had a flexible user-accessible dict/JSON object of attributes rather than just a few hard-coded flags like trainable
plus some for regularization.
That is, make it easy for users to (a) add arbitrary flags/properties to Parameters during layer creation and (b) make it easy for users to filter all Parameters according to those flags/properties. Then it will be very easy to write apply_regularization
functions. It will be a natural part of the framework and won't have to be somehow separately designed or implemented.
This is more generally useful. For example, people may want to tag some parameters with {'debug': True} and then write some code that prints min/max/np.any(np.isnan(.)) etc. for those Parameters regularly during training.
Such patterns are common and often a little cumbersome in TensorFlow code: People maintain separate lists of variables in global scope for each attribute, rather than just storing the attributes with the parameter.
<img src="./images/l1_regularization.png" align="center"/><br/> | ||
|
||
A much more detailed mathematical background of reguilarization can be found [here](http://www.deeplearningbook.org/contents/regularization.html). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: reguilarization => regularization
#### High-level API | ||
|
||
In PaddlePaddle Python API, users will primarily rely on [layer functions](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/python_api.md#layer-function) to create neural network layers. Hence, we lso need to provide regularization functionality in layer functions. The design of these APIs can be postponed for later right now. A good reference for these APIs can be found in [Keras](https://keras.io/regularizers/) and also by looking at Tensorflow in [`tf.contrib.layers`](https://www.tensorflow.org/api_guides/python/contrib.layers). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: 'lso' => 'also'
- L1_regularization_op | ||
|
||
These ops can be like any other ops with their own CPU/GPU implementations either using Eigen or separate Cpu and GPU kernels. As the initial implementation, we can implement their kernels using Eigen following the abstraction pattern implemented for [Activation Ops](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/accuracy_op.h). This abstraction pattern can make it very easy to implement new regularization schemes. other than L1 and L2 norm penalties. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "Cpu" => "CPU"
typo: "schemes. other than" => delete period
|
||
### Low-Level implementation | ||
|
||
In the new design, we propose to create new operations for regularization. For now, we can add 2 ops thgat correspond to the most frequently used regularizations: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thgat ==> that
|
||
## How to do Regularization in PaddlePaddle | ||
|
||
On surveying existing frameworks like Tensorflow, PyTorch, Caffe, etc, it can be seen that there are 2 common approaches of doing regularization: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comparison about torch, tensorflow!
Maybe the survey part can be put into an issue or wiki page and add a reference to the bottom. e.g dependency engine.
My point is the design doc should be simplified enough for the users who want to learn the design detail. If they want more details about the design decision, they can find the related survey/discussion easily. But we should not put them into design doc.
|
||
## Introduction to Regularization | ||
A central problem in machine learning is how to design an algorithm that will perform well not just on the training data, but also on new data. Many strategies are used by machine learning practitioners to reduce the test error, possibly at the expense of increased training error. These strategies are collectively known as **regularization**. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can introduce the overfit
problem and then comes out the regularization
.
Many strategies are used by machine learning practitioners to reduce the test error, possibly at the expense of increased training error.
These strategies summarization of regularization
is not very accurate to my first glance.
from the wikipedia,
regularization
In general, regularization is a technique that applies to objective functions in ill-posed problems formulated as optimization problems
|
||
#### High-level API | ||
|
||
In PaddlePaddle Python API, users will primarily rely on [layer functions](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/python_api.md#layer-function) to create neural network layers. Hence, we lso need to provide regularization functionality in layer functions. The design of these APIs can be postponed for later right now. A good reference for these APIs can be found in [Keras](https://keras.io/regularizers/) and also by looking at Tensorflow in [`tf.contrib.layers`](https://www.tensorflow.org/api_guides/python/contrib.layers). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related this PR, I just want to remind that our V2 API
is a determined interface. We must compatible with it. I'm not sure the layer function will take this job or regularization function also need to consider it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @dzhwinter We might have to discuss this offline because even I had this concern. However, I saw that we are also changing the interface for the python Optimizers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just see this PR. As I understand, the regularizers only have gradient operators. There is no need to create the forward operator which causes computations playing not important roles (only to print a loss containing regularizers to users). The regularizer can be calculated only when parameter updates. The l2 regularizer can be easily implemented in the optimizer (the old concept in PaddlePaddle, maybe it is changed now.), but L1 is very special.
Just some thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lcy-seso I agree with you that the L2 regularizer can be easily implemented in the optimizer. However, I believe that implementing in the optimizer can be an added feature that Paddle could support. Having regularization ops can be a more general way for us and Paddle users to implement custom regularization schemes. Let me clean this doc for typos and then we can vote/discuss on both the options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem. I am also curious about and interested in how we are going to implement the L1 regularization.
|
||
#### Creation of Regularization ops | ||
There are two possibilities for creating the regularization ops: | ||
1. We create these ops immediately while building the computation graph. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should write down the proposal one. The same reason with above comment, we can put these two options in an issue or somewhere, then we discuss or vote for a better choice.
And for here it will definitely be option 2.
Because when we run a specific target, some operators should not be included, for example, the googlenet(inception model) run different level optimized targets, or when do the serving job such as inference examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
No description provided.