[BYOC] Refine AnnotateTarget and MergeCompilerRegion Passes #5277

comaniac · 2020-04-08T06:58:29Z

Since we are working on TRT support with BYOC infra, we added some features and fixed some bugs in AnnotateTarget and MergeCompilerRegion passes. Here are the change logs:

Refactor 1: Both AnnotateTarget and MergeCompilerRegion passes output a graph with all ops annotated. Ops that will be handled by TVM will also be annotated with target "default".

Accordingly, we don't need AnnotateRestDefault anymore.
Accordingly, we don't remove default annotation at the end of MergeCompilerRegion.
A workaround that adds a temporary pass to remove all default annotations. A TODO was added to remind us improving partition graph pass later.
Fixed the issue that AnnotateRestDefault mis-annotated composite functions (because we integrate this pass into AnnotateTarget.)
Fixed the namespace of MergeCompilerRegion.
Fixed type propagation and tuple bug in AnnotateTarget pass (@trevor-m ).

Refactor 2: Support multiple runs for AnnotateTarget. Users may run AnnotateTarget multiple times because different targets may need the Relay graph at different stage (e.g., before or after QnnCanonicalize pass). This refactor makes sure AnnotateTarget pass can handle a Relay graph that has been annotated.

Refactor 3: Support Multiple Targets in AnnotateTarget pass. In case more than one targets can take the same Relay graph, we intend to run AnnotateTarget once to annotate all of them to open global optimization opportunities for future developments.

Now AnnotateTarget takes either a target string or a list of targets. AnnotateTarget will invoke target specific op-based rule functions in order.
Accordingly, now regions in AnnotateRegionSet also have target attribute.

cc @zhiics, @trevor-m, @mbaret, @manupa-arm

src/relay/transforms/annotate_target.cc

mbaret · 2020-04-08T09:24:24Z

Thanks for this, a few high level comments first.

Could we separate some of these changes out into separate PRs? In particular, separating bug fixes from features so we can get them in quickly.
Am I right to think that the intended flow here is to only run AnnotateTarget once? I'm not sure how we would use it in the case where we have two codegens, one that wants to see the graph with QNN ops and the other than wants the graph without them.
What's the advantage of this multi-target approach over running the entire partitioning pipeline multiple times?

comaniac · 2020-04-08T17:07:37Z

Thanks for this, a few high level comments first.

Could we separate some of these changes out into separate PRs? In particular, separating
bug fixes from features so we can get them in quickly.

It's hard to separate bug fixes to another PR because some fixes were done with refactoring.

Am I right to think that the intended flow here is to only run AnnotateTarget once? I'm not sure how we would use it in the case where we have two codegens, one that wants to see the graph with QNN ops and the other than wants the graph without them.

Yes I suppose the flow is intended to run AnnotateTarget only once, but I'm not sure I get your issue. If a user wants to have a QNN model then she will run the QNN pass first anyhow to generate QNN ops in the graph. Under this assumption, if a codegen doesn't want to see QNN ops, it should not have fannotate and AnnotateTarget will annotate QNN ops to other codegens or just default.

What's the advantage of this multi-target approach over running the entire partitioning pipeline multiple times?
I think I understand your concern more after this question. You're saying if a user wants to offload a graph to multiple targets then she will run AnnotateTarget->MergeCompilerRegion->PartitionGraph multiple times. IMHO, this is not an ideal flow due to the following reasons:

If the flow is designed for offloading only one target, it's hard to consider global optimization over multiple targets in the future. For example, 1st run with TargetA creates 2 subgraphs due to restriction set; 2nd run with TargetB creates 1 subgraph (so 3 subgraphs in total). However, assuming the second subgraph for TargetA can also be offloaded to TargetB, meaning that if we run TargetB first than we will get 2 subgraphs in total.
If the flow may take a graph that has been partitioned by another target in advance, all passes in this flow will need to deal with partitioned functions. It's hard to maintain and even harder to integrate global optimization in the above situation.
The position of PartitionGraph in the pass manager is meaningful. We guarantee external functions generated by this pass can be processed by following passes, but not others.

mbaret · 2020-04-08T17:19:05Z

To explain the point about QNN a bit better, a concrete example would be our Ethos-N NPU codegen paired up with something like DNNL. The NPU directly supports quantised convolutions and so wants to mark qnn.conv2d as supported. DNNL doesn't supported quantised convolutions but does support normal convolutions, so it wants to see nn.conv2d. Now the TFLite frontend will initially produce a graph containing qnn.conv2d's from a qnn model. If we choose to annotate here, only the Ethos-N codegen will be able to target the graph. If, however, we choose to run QnnCanonicalize, the qnn.conv2d's are lowered to nn.conv2d's but now only DNNL can annotate the graph.

To me the ideal solution here is to be able to run AnnotateTarget multiple times on different lowerings of the graph, because I agree that running the partitioning pipeline multiple times is not a good long term solution (although it probably is a good short term one).

zhiics · 2020-04-08T17:51:17Z

@mbaret The flow you mentioned makes sense to me. I think it is really case by case. Supporting annotation of multiple targets doesn't really prevent users from invoking annoate_target multiple times when needed. However, it gives users a globally view of the graph and provides them an opportunity to optimize the scheduling of ops. Annotating graph mutliple times looks reasonable to me, but we should avoid reverting the partitioned graph and repartitioning it as this could be very error-prone.

mbaret · 2020-04-08T17:55:00Z

By moving AnnotateRestDefault's functionality into AnnotateTarget, won't that prevent us from being able to run AnnotateTarget multiple times (because it will annotate defaults multiple times)?

comaniac · 2020-04-08T18:03:47Z

By moving AnnotateRestDefault's functionality into AnnotateTarget, won't that prevent us from being able to run AnnotateTarget multiple times (because it will annotate defaults multiple times)?

Even the original implementation has the similar issue because it didn't check if an argument is compiler end. I'm making a commit to only modify the annotation attribute for the current target if the annotation has already there.

mbaret · 2020-04-08T18:13:04Z

Has there been consideration of the proposal we made with the graph partitioning RFC (multiple independent annotation passes + a deconflict pass)?

comaniac · 2020-04-08T20:06:18Z

Has there been consideration of the proposal we made with the graph partitioning RFC (multiple independent annotation passes + a deconflict pass)?

I don't think this PR is against the RFC because this PR doesn't really change the flow you implemented, but just make the annotation pass more general by considering multiple targets. On the other hand, as @zhiics mentioned, you can still run the annotation pass multiple times followed by another deconflict pass if you really have to.

zhiics · 2020-04-08T21:16:40Z

@masahi please also help review this PR. It may have some conflict with your PR.

masahi · 2020-04-08T22:10:51Z

It may have some conflict with your PR

Yes there will be some conflict, since I refactored dnnl/codegen.cc quite a bit. But it seems we only need to copy paste the change in this PR to GenerateBody function in my PR, so I think resolving is easy. For me it is fine if this PR is merged first.

Reviewing this PR is a bit difficult for me since I don't know the context of this PR (I don't use either AnnotateTarget or MergeCompilerRegion in my work), and as @mbaret mentioned, there are different kinds of changes (bug fix, improvement etc). I also prefer breaking up into smaller PRs.

windclarion

composite function itself will not reach the branch, but any other OpNode will reach the branch, and if I only use composite function, I will not register target.xxxx op attr, so OpNode branch will report fail and abort.

src/relay/transforms/annotate_target.cc

comaniac · 2020-04-09T05:49:46Z

composite function itself will not reach the branch, but any other OpNode will reach the branch, and if I only use composite function, I will not register target.xxxx op attr, so OpNode branch will report fail and abort.

Ah I got your point. You're right that check is necessary. I'll add it back. Thanks.

comaniac · 2020-04-09T05:59:30Z

@masahi thanks for the suggestion. I've reverted the DNNL part and will file another PR for it.
@windclarion checker added.

zhiics

I think we should bring this PR in since it 1) fixes a bunch of bugs that block several ppl, 2) doesn't really change the flow the current annotation and partitioning flow.

src/relay/analysis/annotated_region_set.cc

src/relay/analysis/annotated_region_set.h

zhiics · 2020-04-10T15:33:58Z

Let's rebase and bring this in and then we can bring @masahi's PR in as well.

manupak · 2020-04-15T10:06:49Z

Sorry for the late comments ( I was out for a week ).

Among the bug fixes (e.g., type propogation in tuple bug) and new functionalities (e.g., supporting a list of targets in AnnotateTarget), this PR had technically moved (and improved) AnnotateRestDefault to AnnotateTarget and moved the removal of defaults to PartitionGraph.

IIRC, the defaults were a requirement of the MergeCompilerRegions pass and thats why they were inside MCR ? I just want to understand the rationale as to why you think they should moved as such and potentially exposing them ("defaults") in the graph that is being passed among passes?

PS : I agree with @zhiics that this pass do not change the flow and agree with the most of the fixes that are brought in alongside the moving of "default" related components, though ideally should have been better if they were seperate PRs.

comaniac · 2020-04-15T16:48:52Z

One reason that I exposed default between passes was to have a clear task division so that the flow would be more stable. For example, one bug about composite function reported in #5272 was because AnnotateTarget pass skipped composite functions but AnnotateRestDefault didn't, so the default annotations were mis-inserted and it results in segmentation fault when partitioning the graph. This may happen in the future if still we have more than one places doing similar tasks.

) * add target to region * refactor annotate_target * Make all unit test working * quick fix * enable BN, unit test failed * Fix vm test, unit test. Refactor annotate_target a bit. * quick fix fusion * revert fusion change * style fix * Refactor merge region pass * format * minor fix * Skip e2e test * lint * support AnnotateTarget multiple runs * Add HasAttr and revert DNNL codegen * address comment Co-authored-by: Zhi Chen <chzhi@amazon.com>

…m_data:master to master * commit 'cd0d52daa6942bdafa9363ff6cfa3d25fcd5b8d6': (824 commits) [Intrinsic] Add log1p, ldexp, atan2, hypot, nextafter, copysign (apache#5312) [Rust][CI] Restore Rust CI (apache#5137) Remove PrimExpr from String (apache#5311) [Requantize] Cleanup and Optimize Lowering (apache#5286) [IR][TRANSFORM] Enable CopyOnWrite for passes. (apache#5309) [PYTORCH]Abs, Arange, Softplus ops (apache#5295) [LLVM] Fix generation of LLVM intrinsics (apache#5282) [BYOC] Add example of Composite + Annotate for DNNL fused op (apache#5272) [Frontend][TensorFlow]Improve TensorFlow Static Shape Tensor Array (apache#5243) [RUNTIME] Introduce RValue reference(move) support to TypedPackedFunc (apache#5271) [RELAY][FRONTEND][CAFFE2] add Mul and ConvTranspose operator (apache#5302) [BYOC] Refine AnnotateTarget and MergeCompilerRegion Passes (apache#5277) [CI] Fix the hexagon string (apache#5304) [Arith] linear system and equation solver (apache#5171) [PYTORCH]Repeat, Reciprocal & Reshape Op support (apache#5280) [FRONTEND][TENSORFLOW] Fix gather_nd indices (apache#5279) Update device_annotation.cc (apache#5291) [REFACTOR][IR] Move to runtime::String (apache#5276) [NDArray] Set shape_ in NDArray::FromDLPack (apache#5301) [RUNTIME] Initial implementation of Hexagon runtime support (apache#5252) ...

comaniac mentioned this pull request Apr 8, 2020

[RELAY][BYOC] for composite function, FTVMAnnotateTarget maybe do not exist #5274

Closed

windclarion reviewed Apr 8, 2020

View reviewed changes

src/relay/transforms/annotate_target.cc Show resolved Hide resolved

comaniac force-pushed the re_ext_infra branch from 79bdb6f to dbd6301 Compare April 8, 2020 17:24

comaniac force-pushed the re_ext_infra branch 2 times, most recently from de9e85f to c952722 Compare April 8, 2020 20:42

windclarion reviewed Apr 9, 2020

View reviewed changes

src/relay/transforms/annotate_target.cc Show resolved Hide resolved

comaniac mentioned this pull request Apr 9, 2020

[BYOC] Add example of Composite + Annotate for DNNL fused op #5272

Merged

zhiics reviewed Apr 9, 2020

View reviewed changes

src/relay/analysis/annotated_region_set.cc Outdated Show resolved Hide resolved

src/relay/analysis/annotated_region_set.h Outdated Show resolved Hide resolved

zhiics approved these changes Apr 9, 2020

View reviewed changes

comaniac and others added 8 commits April 10, 2020 18:00

add target to region

333b215

refactor annotate_target

36d20da

Make all unit test working

ac040a0

quick fix

10add0d

enable BN, unit test failed

b89d8c2

quick fix fusion

0bd1128

Fix vm test, unit test. Refactor annotate_target a bit.

e8a41da

revert fusion change

5c9adff

zhiics and others added 9 commits April 10, 2020 18:00

style fix

8e3e014

Refactor merge region pass

cfb7608

format

9e869c4

minor fix

4515403

Skip e2e test

41147a9

lint

4a7bf3a

support AnnotateTarget multiple runs

3da42ed

Add HasAttr and revert DNNL codegen

e2b7a7e

address comment

13cfa83

comaniac force-pushed the re_ext_infra branch from 0b6fbcb to 13cfa83 Compare April 10, 2020 18:01

zhiics merged commit f506c8b into apache:master Apr 10, 2020

zhiics added the status: accepted label Apr 10, 2020

comaniac deleted the re_ext_infra branch April 10, 2020 21:33

ZihengJiang mentioned this pull request Sep 25, 2020

TVM v0.7 Release Note Candidate #6486

Closed

comaniac mentioned this pull request Oct 13, 2020

[BYOC] Added "include_non_call_ops" parameter to AnnotateTarget pass #6655

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BYOC] Refine AnnotateTarget and MergeCompilerRegion Passes #5277

[BYOC] Refine AnnotateTarget and MergeCompilerRegion Passes #5277

comaniac commented Apr 8, 2020 •

edited

Loading

mbaret commented Apr 8, 2020

comaniac commented Apr 8, 2020

mbaret commented Apr 8, 2020

zhiics commented Apr 8, 2020

mbaret commented Apr 8, 2020

comaniac commented Apr 8, 2020

mbaret commented Apr 8, 2020

comaniac commented Apr 8, 2020

zhiics commented Apr 8, 2020

masahi commented Apr 8, 2020 •

edited

Loading

windclarion left a comment

comaniac commented Apr 9, 2020

comaniac commented Apr 9, 2020

zhiics left a comment

zhiics commented Apr 10, 2020

manupak commented Apr 15, 2020

comaniac commented Apr 15, 2020

[BYOC] Refine AnnotateTarget and MergeCompilerRegion Passes #5277

[BYOC] Refine AnnotateTarget and MergeCompilerRegion Passes #5277

Conversation

comaniac commented Apr 8, 2020 • edited Loading

mbaret commented Apr 8, 2020

comaniac commented Apr 8, 2020

mbaret commented Apr 8, 2020

zhiics commented Apr 8, 2020

mbaret commented Apr 8, 2020

comaniac commented Apr 8, 2020

mbaret commented Apr 8, 2020

comaniac commented Apr 8, 2020

zhiics commented Apr 8, 2020

masahi commented Apr 8, 2020 • edited Loading

windclarion left a comment

Choose a reason for hiding this comment

comaniac commented Apr 9, 2020

comaniac commented Apr 9, 2020

zhiics left a comment

Choose a reason for hiding this comment

zhiics commented Apr 10, 2020

manupak commented Apr 15, 2020

comaniac commented Apr 15, 2020

comaniac commented Apr 8, 2020 •

edited

Loading

masahi commented Apr 8, 2020 •

edited

Loading