Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RELAY][PASS] CombineParallelConv2D #2089

Merged
merged 15 commits into from
Nov 22, 2018

Conversation

vinx13
Copy link
Member

@vinx13 vinx13 commented Nov 12, 2018

This pass replace convolutions that share the same input node and the same arguments (except that the number of output channels can be different) with a single convolution. The weight of the new 2d convolution is the concatenation of the original weights.
The original conv2d nodes are replaced with strided_slice ops to get a slices of the output of the new conv2d.

This prevents launching multiple kernels in networks with multiple convolution branches, such as Inception block.

  • use strided_slice instead of take
  • fold following bias, relu, bn

Algorithm

  1. Find parallel branches like
      data
    /      \
conv2d    conv2d
|            |
op           op
|            |

where op is elemwise or broadcast op.
2. Group branches by kernel shape and attrs of conv2d.
3. Combine conv2d in the same group and possibly combine subsequent ops.
4. Use strided_slice to split the output of the combined op.

Please review @tqchen @jroesch @ZihengJiang @masahi

src/relay/pass/expr_subst.h Outdated Show resolved Hide resolved
src/relay/pass/fold_conv2d.cc Outdated Show resolved Hide resolved
@vinx13 vinx13 force-pushed the feature/fold_conv2d_pass branch from 4d24076 to 0705a3f Compare November 12, 2018 05:58
@tqchen
Copy link
Member

tqchen commented Nov 12, 2018

@vinx13 can you also elaborate a bit on possible use-cases of this pass?

@vinx13 vinx13 force-pushed the feature/fold_conv2d_pass branch 2 times, most recently from 5c5c8dc to 2accc32 Compare November 12, 2018 06:36
@masahi
Copy link
Member

masahi commented Nov 12, 2018

I guess a bigger kernel is better for performance.

@merrymercy
Copy link
Member

TensorRT does this optimization https://devblogs.nvidia.com/deploying-deep-learning-nvidia-tensorrt/#attachment_6827
It can benefit inception like networks.

@masahi
Copy link
Member

masahi commented Nov 12, 2018

@vinx13 What happens if convolutions are followed by elemwise ops? (bias, batch norm etc, which should be fused into a single op) Elemwise ops that follow a convolution can be different for each child convolution branch.

Can the folded convolution still be fused with elemwise ops?

@vinx13
Copy link
Member Author

vinx13 commented Nov 12, 2018

After this pass, the original convolutions will be replaced with Take. Elemwise ops will follow Take (each branch will follow a different Take).
Take is an injective op, so we need a way to fuse convolution and take.

@masahi
Copy link
Member

masahi commented Nov 12, 2018

hmm interesting. I understand that we can at least fuse elemwise ops into each Take. But even if we allow fusing convolution with Take, I don't see how we can fuse multiple Take with a single folded conv op.

Code itself looks good though. I understood how it works.

@merrymercy
Copy link
Member

So we should also fold the following bias, relu and bn to keep the final graph easy for fusion. These elementwise ops should be placed before Take.

@masahi
Copy link
Member

masahi commented Nov 12, 2018

yeah, but element wise ops that follow convolution can be different for each child convolution branch (I don't know about inception network). So I think there will be less chance of folding?

Since there are multiple convolution branches, not everything can be fused anyway. If we fold element wise ops as well, we are left with multiple Take ops that cannot be fused with anything.

I think we might as well stop fusing (or "Realize" in a NNVM term) at the folded convolution op, and let Take ops be fused with element wise ops.

@tqchen
Copy link
Member

tqchen commented Nov 12, 2018

I am working on a new fusion pass in the new relay IR and hopefully we can followup on this topic there

@tqchen
Copy link
Member

tqchen commented Nov 13, 2018

Some specific followup comments:

  • Let us also think a bit more about naming, usually fold means removing(e.g. constant folding). Maybe a more appropriate name is CombineParallelConv2D?
  • Let use avoid using take, but instead use slice operator to slice the outputs
  • Let us also fold followup bias, relu and bn when possible as suggested by @merrymercy

@vinx13 vinx13 force-pushed the feature/fold_conv2d_pass branch 2 times, most recently from 03891e8 to a4078ba Compare November 13, 2018 11:54
@tqchen tqchen changed the title [RELAY][PASS] Add FoldConv2D pass [RELAY][PASS] CombineParallelConv2D Nov 13, 2018
@@ -135,6 +150,20 @@ inline Constant MakeConstantScalar(DataType dtype, T value) {
return ConstantNode::make(arr);
}

template<typename T, typename = typename std::enable_if<std::is_integral<T>::value>::type>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us use slice instead of take to get the final result

@tqchen
Copy link
Member

tqchen commented Nov 13, 2018

The new set of changes LGTM, @vinx13 can you also add support for fusing followup ops and a testcase?

@tqchen tqchen added the status: need update need update based on feedbacks label Nov 13, 2018
@tqchen
Copy link
Member

tqchen commented Nov 14, 2018

strided_slice is added in #2094

@vinx13 vinx13 force-pushed the feature/fold_conv2d_pass branch 3 times, most recently from 2365049 to d0599e8 Compare November 14, 2018 11:36
@tqchen
Copy link
Member

tqchen commented Nov 20, 2018

@tqchen tqchen added status: need review and removed status: need update need update based on feedbacks labels Nov 20, 2018
@tqchen
Copy link
Member

tqchen commented Nov 21, 2018

@vinx13 please rebase against the master

@vinx13 vinx13 force-pushed the feature/fold_conv2d_pass branch 2 times, most recently from e2f2cd1 to a0f8331 Compare November 22, 2018 02:35
@vinx13 vinx13 force-pushed the feature/fold_conv2d_pass branch from a0f8331 to bed2900 Compare November 22, 2018 02:36
@tqchen tqchen merged commit 53ac89e into apache:master Nov 22, 2018
@tqchen
Copy link
Member

tqchen commented Nov 22, 2018

Thanks, @MarisaKirisame @masahi @vinx13 this is merged!

FrozenGene pushed a commit to FrozenGene/tvm that referenced this pull request Dec 27, 2018
@ZihengJiang ZihengJiang mentioned this pull request Feb 1, 2019
wweic pushed a commit to neo-ai/tvm that referenced this pull request Feb 20, 2019
wweic pushed a commit to neo-ai/tvm that referenced this pull request Feb 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants