[Relay] Add new IR pass CombineParallelDense #3862

soiferj · 2019-08-30T17:21:20Z

This IR pass is similar to CombineParallelConv2D, but with the Dense operator. The pass takes multiple parallel dense operators and combines them into one batch_matmul operator. Element-wise operations after dense are also stacked and fused.

The code is refactored so CombineParallelConv2D and CombineParallelDense can share code, and make it easier to write a new "combine" pass later.

This is in opt level 4. In another PR, we can add tuning to decide whether or not to apply this pass.

This will help models like BERT, which have parallel Dense + BiasAdd at the start of each layer.

Discussed in this RFC: https://discuss.tvm.ai/t/discussion-new-ir-pass-proposal-combineparalleldense/3813

@vinx13 @icemelon9 would you be able to take a look?

jroesch · 2019-09-01T00:56:33Z

@vinx13 I will add my review can you as well?

jroesch · 2019-09-01T00:57:17Z

@MarisaKirisame can you add review as well

src/relay/pass/combine_parallel_op.h

python/tvm/relay/transform.py

src/relay/pass/combine_parallel_op.cc

src/relay/pass/combine_parallel_op.h

jroesch · 2019-09-01T01:08:50Z

Overall LGTM, thanks for you contribution! My comments are about documentation mostly.

We are making a push this release cycle to improve the docs. I am trying to encourage everyone to incrementally grow them with each PR to passes, etc.

I will take another pass after other reviews come in, and then merge.

src/relay/pass/combine_parallel_op.h

src/relay/pass/combine_parallel_conv2d.cc

MarisaKirisame · 2019-09-01T02:20:25Z

As most operators take in batched input, it is not hard to imagine different version of CombineXXX... Furthermore, if we do dynamic batching ala tensorflow fold (which I think we will do), we need a CombineXXX for every XXX operator. So, is it possible to merge the two combine into a generic pass?

soiferj · 2019-09-03T03:45:15Z

@MarisaKirisame that's a really good idea. The only tricky part is that IsOpSupported and AreOpsCompatible may need to change for each op. For example, with Dense, we need to make sure that the units attribute is empty.

How about this: we create a new class, CombineParallelOpBatch, that has a default implementation of IsOpSupported which returns true, and a default implementation of AreOpsCompatible which returns true if the shapes and dtypes of both ops are the same. From there, we can have CombineParallelDense inherit from CombineParallelOpBatch and override IsOpSupported and AreOpsCompatible. The constructor can take the name of the op and the min number of branches to combine. This way, we can easily add new IR passes by adding a line like CombineParallelOpBatch('add', 3).

I'm not sure it's possible to merge CombineParallelConv2D and CombineParallelDense because CombineParallelConv2D concatenates the inputs rather than stacking. The implementations of most methods would have to change.

What do you think? @jroesch any thoughts?

MarisaKirisame · 2019-09-03T03:53:07Z

@soiferj sounds good to me. maybe the default should be False, because op need manual batching to be supported (for example, changing dense to BatchMatMul). For the easy case, we can override DeclareUnaryOp and DeclareBinaryOp, which will do the job

soiferj · 2019-09-03T03:55:09Z

@MarisaKirisame, sounds good. For now, I'll add support for this functionality. I think we can decide how we actually want to enable it for all ops in another PR. Is that alright?

MarisaKirisame · 2019-09-03T04:02:17Z

@soiferj exactly.

soiferj · 2019-09-03T17:39:52Z

@MarisaKirisame do you think that the new base class, CombineParallelOpBatch, should handle inputs with any number of dimensions and convert them to a batched version of ndim + 1 (for example, (2, 3, 4) -> (1, 2, 3, 4)), or should we just handle inputs with 2 dimensions and convert them to 3 dimensions (for example, (3, 4) -> (1, 3, 4))?

The reason why I ask is that broadcasting logic for element-wise ops that follow the main op can get tricky if we support an arbitrary number of dimensions.

MarisaKirisame · 2019-09-03T18:19:57Z

@soiferj both are completely ok with me.
the main scenario I am afraid of, is you add CombineDense, someone add CombineRelu next week, another one add CombineSoftmax, etc etc etc... And after one year we see 10 seprate pass that could be unifiable. If we start with a better design, we save ourself a big refactor down the road.
It dont need to support much (more dimension, broadcasting, etc), the infrastructure itself working and allowing extension is the most important part.

jroesch · 2019-09-04T07:15:23Z

@soiferj sounds good to me in general, I am always in favor of slowly layering changes, etc but setting yourself up for future success.

I think Marisa's point is that we should be able to hook just a few methods and leave the analysis/traversal code generic between the passes. For example we could also overload how they are combined with another hook, i.e concat vs. stack.

If I understand the goal correctly is we have some operators like, op1(args, attrs) and op1(args, attrs) we should only need to add 3 methods, one to check if op1 is supported, should be an easy one liner, that the args and attrs match, and how to combine the values, i.e. call stack on a sequence of arguments.

soiferj · 2019-09-04T18:47:47Z

Just pushed a new set of changes that introduces CombineParallelOpBatch. There is also a little bit of special logic for combining bias_add, as it needs to be broadcasted. For now, I'm pretty happy with the code design. I think we can look at moving the stack and concatenate logic into the base class if we see a need to in the future. Let me know what you think!

soiferj · 2019-09-04T21:07:53Z

Also, I have a design question: after we support CombineParallelXXXX for many ops, is there a purpose to fusing elementwise operators in these classes? If we can really combine most parallel ops into batch ops, the regular fusion pass should do this for us, right? This will clean up the code a lot.

vinx13 · 2019-09-10T17:02:37Z

@soiferj Likely some fused operations are not inlined. Can you check what sub-function after fusion raised this error during Relay compilation?

soiferj · 2019-09-10T17:09:48Z

The error says it's coming from split_dev_host_funcs in codegen/build_module.cc. The line is

CHECK(ir::VerifyMemory(x, target->device_type)) << "Direct host side access to device memory is detected in " << x->func_name() << ". Did you forget to bind?";

vinx13 · 2019-09-10T17:15:15Z

We need to trace the Relay part of these functions (and see how their TOPI schedules are called)
You can use debug logging here to get more info
https://github.com/dmlc/tvm/blob/master/python/tvm/relay/backend/_backend.py#L51-L53

soiferj · 2019-09-10T17:41:41Z

Okay, I have the output. What exactly should I be looking for?

vinx13 · 2019-09-10T17:49:33Z

@soiferj After fusion the Relay program is split into several parts like

%x = fn(...) {
  dense
  relu
}

Each part is passed to tvm.build_module independently. So the thing is to find exactly which part is broken

soiferj · 2019-09-10T19:58:04Z

@vinx13, nevermind, I made a mistake on my testing branch. I created a new branch, merged my changes with master, and can run the model now. Sorry about that.

soiferj · 2019-09-10T23:18:59Z

The CI failure doesn't look related to me...it is occurring in test_forward_conv_batch_norm. The output from the test actually looks correct too. Is this a known issue? Edit: issue has been resolved.

soiferj · 2019-09-12T16:33:48Z

Thanks everyone for your help and feedback :) This PR has been approved for the past few days - would someone be able to take a look at merging when they get a chance?

src/relay/pass/combine_parallel_op.h

src/relay/pass/combine_parallel_op_batch.h

src/relay/pass/combine_parallel_op.h

python/tvm/relay/transform.py

src/relay/pass/combine_parallel_op.h

icemelon · 2019-09-16T21:55:12Z

@jroesch Could you take one more look at this PR?

soiferj · 2019-09-20T22:17:45Z

@jroesch would you mind taking a look when you have a chance?

jroesch · 2019-09-24T08:06:46Z

Sorry for dropping the ball on this one, been busy with life and other work things :) will merge tonight.

jroesch · 2019-09-24T08:11:52Z

I think everything looks good to me, thanks for contribution, especially all the extra work generalizing the interface

* Refactor to create abstract ParallelOpCombiner * First draft of CombineParallelDense * Begin to work on tests * Test * Refactor to move out more common code * Clean up * Fix * Remove statics * fix wording * Start to add combine_parallel_op_batch * Resolve PR comments * Resolve PR comments * dummy change to retrigger CI * Change special case from bias_add to add * Revert special case change * Ignore units check * dummy change to retrigger CI * dummy change to re-trigger CI * Improve docs * Update docs * Update docs

jonso4 added 9 commits August 26, 2019 15:02

Refactor to create abstract ParallelOpCombiner

c7120cb

First draft of CombineParallelDense

536c09f

Begin to work on tests

8f1502e

Test

2591c35

Refactor to move out more common code

b9c1128

Clean up

08e9ccf

Fix

737d1cd

Remove statics

fc95cde

fix wording

8e85c85

tqchen added the status: need review label Aug 31, 2019

jroesch requested a review from vinx13 September 1, 2019 00:56

jroesch assigned vinx13 and jroesch Sep 1, 2019

jroesch requested changes Sep 1, 2019

View reviewed changes

src/relay/pass/combine_parallel_op.h Outdated Show resolved Hide resolved

python/tvm/relay/transform.py Show resolved Hide resolved

src/relay/pass/combine_parallel_op.cc Show resolved Hide resolved

src/relay/pass/combine_parallel_op.h Outdated Show resolved Hide resolved

vinx13 requested changes Sep 1, 2019

View reviewed changes

src/relay/pass/combine_parallel_op.h Outdated Show resolved Hide resolved

src/relay/pass/combine_parallel_conv2d.cc Outdated Show resolved Hide resolved

Start to add combine_parallel_op_batch

132dd10

Resolve PR comments

c4908b7

dummy change to retrigger CI

b2bfad9

dummy change to re-trigger CI

fe61ed9

icemelon requested changes Sep 12, 2019

View reviewed changes

src/relay/pass/combine_parallel_op.h Outdated Show resolved Hide resolved

src/relay/pass/combine_parallel_op_batch.h Show resolved Hide resolved

src/relay/pass/combine_parallel_op.h Show resolved Hide resolved

python/tvm/relay/transform.py Outdated Show resolved Hide resolved

Improve docs

8405517

soiferj requested a review from icemelon September 13, 2019 15:46

icemelon reviewed Sep 13, 2019

View reviewed changes

src/relay/pass/combine_parallel_op.h Outdated Show resolved Hide resolved

src/relay/pass/combine_parallel_op.h Outdated Show resolved Hide resolved

Update docs

17d17b5

soiferj requested a review from icemelon September 16, 2019 17:24

icemelon reviewed Sep 16, 2019

View reviewed changes

src/relay/pass/combine_parallel_op.h Outdated Show resolved Hide resolved

Update docs

b86854d

icemelon approved these changes Sep 16, 2019

View reviewed changes

jroesch approved these changes Sep 24, 2019

View reviewed changes

jroesch merged commit ed9fdfb into apache:master Sep 24, 2019

soiferj deleted the soiferj/paralleldense branch September 24, 2019 17:14

tqchen mentioned this pull request Nov 8, 2019

[RELEASE][DRAFT] TVM v0.6 Release candidate #4259

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Relay] Add new IR pass CombineParallelDense #3862

[Relay] Add new IR pass CombineParallelDense #3862

soiferj commented Aug 30, 2019 •

edited

Loading

jroesch commented Sep 1, 2019

jroesch commented Sep 1, 2019

jroesch commented Sep 1, 2019

MarisaKirisame commented Sep 1, 2019

soiferj commented Sep 3, 2019 •

edited

Loading

MarisaKirisame commented Sep 3, 2019

soiferj commented Sep 3, 2019

MarisaKirisame commented Sep 3, 2019

soiferj commented Sep 3, 2019 •

edited

Loading

MarisaKirisame commented Sep 3, 2019

jroesch commented Sep 4, 2019

soiferj commented Sep 4, 2019 •

edited

Loading

soiferj commented Sep 4, 2019

vinx13 commented Sep 10, 2019

soiferj commented Sep 10, 2019

vinx13 commented Sep 10, 2019 •

edited

Loading

soiferj commented Sep 10, 2019 •

edited

Loading

vinx13 commented Sep 10, 2019

soiferj commented Sep 10, 2019 •

edited

Loading

soiferj commented Sep 10, 2019 •

edited

Loading

soiferj commented Sep 12, 2019

icemelon commented Sep 16, 2019

soiferj commented Sep 20, 2019

jroesch commented Sep 24, 2019

jroesch commented Sep 24, 2019

[Relay] Add new IR pass CombineParallelDense #3862

[Relay] Add new IR pass CombineParallelDense #3862

Conversation

soiferj commented Aug 30, 2019 • edited Loading

jroesch commented Sep 1, 2019

jroesch commented Sep 1, 2019

jroesch commented Sep 1, 2019

MarisaKirisame commented Sep 1, 2019

soiferj commented Sep 3, 2019 • edited Loading

MarisaKirisame commented Sep 3, 2019

soiferj commented Sep 3, 2019

MarisaKirisame commented Sep 3, 2019

soiferj commented Sep 3, 2019 • edited Loading

MarisaKirisame commented Sep 3, 2019

jroesch commented Sep 4, 2019

soiferj commented Sep 4, 2019 • edited Loading

soiferj commented Sep 4, 2019

vinx13 commented Sep 10, 2019

soiferj commented Sep 10, 2019

vinx13 commented Sep 10, 2019 • edited Loading

soiferj commented Sep 10, 2019 • edited Loading

vinx13 commented Sep 10, 2019

soiferj commented Sep 10, 2019 • edited Loading

soiferj commented Sep 10, 2019 • edited Loading

soiferj commented Sep 12, 2019

icemelon commented Sep 16, 2019

soiferj commented Sep 20, 2019

jroesch commented Sep 24, 2019

jroesch commented Sep 24, 2019

soiferj commented Aug 30, 2019 •

edited

Loading

soiferj commented Sep 3, 2019 •

edited

Loading

soiferj commented Sep 3, 2019 •

edited

Loading

soiferj commented Sep 4, 2019 •

edited

Loading

vinx13 commented Sep 10, 2019 •

edited

Loading

soiferj commented Sep 10, 2019 •

edited

Loading

soiferj commented Sep 10, 2019 •

edited

Loading

soiferj commented Sep 10, 2019 •

edited

Loading