Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rewrite backward graph #3963

Merged
merged 3 commits into from
Sep 8, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 17 additions & 17 deletions paddle/framework/backward.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,20 @@

## Motivation

In Neural Network, the backpropagation algorithm follows the chain rule, so we need to compound the fundmental gradient operators/expressions together with chain rule . Every forward network need a backward network to construct the full computation graph, the operator/expression's backward pass will be generated respect to forward pass.
In Neural Network, the backpropagation algorithm follows the chain rule, so we need to compound the gradient operators/expressions together with the chain rule. Every forward network needs a backward network to construct the full computation graph, the operator/expression's backward pass will be generated respect to forward pass.

## Backward Operator Registry

A backward network is built up with several backward operators. Backward operators take forward operators' inputs, outputs and output gradients and then calculate its input gradients.
A backward network is built up with several backward operators. Backward operators take forward operators' inputs outputs, and output gradients and then calculate its input gradients.

| | forward operator | backward operator
| ---------------------- | ---------------- |------------------------- |
| **Operator::inputs_** | Inputs | Inputs, Outputs, OutputGradients |
| **Operator::outputs_** | Outputs | InputGradients |

In most cases, there is a one-to-one correspondence between forward and backward operators. These correspondences are recorded by a global hash map(`OpInfoMap`). To follow the philosophy of minimum core and make operators pluggable, the registry mechanism is introduced.
In most cases, there is a one-to-one correspondence between the forward and backward operators. These correspondences are recorded by a global hash map(`OpInfoMap`). To follow the philosophy of minimum core and make operators pluggable, the registry mechanism is introduced.

For example, we have got a `mul_op`, and we can register it's information and corresponding backward operator by the following macro:
For example, we have got a `mul_op`, and we can register its information and corresponding backward operator by the following macro:

```cpp
REGISTER_OP(mul, MulOp, MulOpMaker, mul_grad, MulOpGrad);
Expand All @@ -27,17 +27,17 @@ REGISTER_OP(mul, MulOp, MulOpMaker, mul_grad, MulOpGrad);

## Backward Opeartor Creating

Given a certain forward operator, we can get its corresponding backward opeartor by calling:
Given a certain forward operator, we can get its corresponding backward operator by calling:

```cpp
OperatorBase* bwd_op = BuildGradOp(const OperatorBase* fwd_op);
```
```

The function `BuildGradOp` will sequentially execute following processes:

1. Get the `type_` of given forward operator, and then get the corresponding backward operator's type by looking up the `OpInfoMap`.

2. Build two maps named `inputs` and `outputs` to temporary storage backward operator's inputs and outputs. Copy forward operator's `inputs_` and `outputs_` to map `inputs`, except these are not necessary for gradient computing.
2. Build two maps named `inputs` and `outputs` to temporary storage backward operator's inputs and outputs. Copy forward operator's `inputs_` and `outputs_` to map `inputs`, except these, are not necessary for gradient computing.

3. Add forward inputs' gradient variables into map `output`, adding forward outputs' gradient variables into map `input`.

Expand All @@ -49,31 +49,31 @@ A backward network is a series of backward operators. The main idea of building

In our design, the network itself is also a kind of operator. So the operators contained by a big network may be some small network.

given a forward network, it generates the backward network. We only care about the Gradients—`OutputGradients`,`InputGradients`.
given a forward network, it generates the backward network. We only care about the Gradients—`OutputGradients`, `InputGradients`.

1. Op

when the input forward network is a Op, return its gradient Operator Immediately.
when the input forward network is an Op, return its gradient Operator Immediately.

2. NetOp

when the input forward network is a NetOp, it need to call the sub NetOp/Operators backward function recursively. During the process, we need to collect the `OutputGradients` name according to forward NetOp.
when the input forward network is a NetOp, it needs to call the sub NetOp/Operators backward function recursively. During the process, we need to collect the `OutputGradients` name according to the forward NetOp.

**shared variable**. As illustrated in the pictures, two operator's `Output` `Gradient` will overwirte their shared input variable.
**shared variable**. As illustrated in the pictures, two operator's `Output` `Gradient` will overwrite their shared input variable.

<p align="center">
<img src="./images/duplicate_op.png" width="70%" ><br/>
<img src="./images/duplicate_op.png" width="50%" ><br/>

1. shared variable in two operators.
1. Shared variable in operators.

</p>

Share variable between operators or same input variable used in multiple operators lead to a duplicate gradient variable. As demo show above, we need to rename gradient name recursively, and add a generic add operator replace the overwirte links.
Share variable between operators or same input variable used in multiple operators leads to a duplicate gradient variable. As demo show above, we need to rename gradient name recursively and add a generic add operator replace the overwrite links.

<p align="center">
<img src="images/duplicate_op2.png" width="90%" ><br/>
<img src="images/duplicate_op2.png" width="50%" ><br/>

2. replace shared variable gradient with `Add` Operator
2. Replace shared variable's gradient with `Add` operator.

</p>

Expand Down
Binary file modified paddle/framework/images/duplicate_op2.graffle
Binary file not shown.
Binary file modified paddle/framework/images/duplicate_op2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.