Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BYOC] Refine AnnotateTarget and MergeCompilerRegion Passes #5277

Merged
merged 17 commits into from
Apr 10, 2020

Conversation

comaniac
Copy link
Contributor

@comaniac comaniac commented Apr 8, 2020

Since we are working on TRT support with BYOC infra, we added some features and fixed some bugs in AnnotateTarget and MergeCompilerRegion passes. Here are the change logs:

Refactor 1: Both AnnotateTarget and MergeCompilerRegion passes output a graph with all ops annotated. Ops that will be handled by TVM will also be annotated with target "default".

  • Accordingly, we don't need AnnotateRestDefault anymore.
  • Accordingly, we don't remove default annotation at the end of MergeCompilerRegion.
  • A workaround that adds a temporary pass to remove all default annotations. A TODO was added to remind us improving partition graph pass later.
  • Fixed the issue that AnnotateRestDefault mis-annotated composite functions (because we integrate this pass into AnnotateTarget.)
  • Fixed the namespace of MergeCompilerRegion.
  • Fixed type propagation and tuple bug in AnnotateTarget pass (@trevor-m ).

Refactor 2: Support multiple runs for AnnotateTarget. Users may run AnnotateTarget multiple times because different targets may need the Relay graph at different stage (e.g., before or after QnnCanonicalize pass). This refactor makes sure AnnotateTarget pass can handle a Relay graph that has been annotated.

Refactor 3: Support Multiple Targets in AnnotateTarget pass. In case more than one targets can take the same Relay graph, we intend to run AnnotateTarget once to annotate all of them to open global optimization opportunities for future developments.

  • Now AnnotateTarget takes either a target string or a list of targets. AnnotateTarget will invoke target specific op-based rule functions in order.
  • Accordingly, now regions in AnnotateRegionSet also have target attribute.

cc @zhiics, @trevor-m, @mbaret, @manupa-arm

@mbaret
Copy link
Contributor

mbaret commented Apr 8, 2020

Thanks for this, a few high level comments first.

  • Could we separate some of these changes out into separate PRs? In particular, separating bug fixes from features so we can get them in quickly.
  • Am I right to think that the intended flow here is to only run AnnotateTarget once? I'm not sure how we would use it in the case where we have two codegens, one that wants to see the graph with QNN ops and the other than wants the graph without them.
  • What's the advantage of this multi-target approach over running the entire partitioning pipeline multiple times?

@comaniac
Copy link
Contributor Author

comaniac commented Apr 8, 2020

Thanks for this, a few high level comments first.

  • Could we separate some of these changes out into separate PRs? In particular, separating
    bug fixes from features so we can get them in quickly.

It's hard to separate bug fixes to another PR because some fixes were done with refactoring.

  • Am I right to think that the intended flow here is to only run AnnotateTarget once? I'm not sure how we would use it in the case where we have two codegens, one that wants to see the graph with QNN ops and the other than wants the graph without them.

Yes I suppose the flow is intended to run AnnotateTarget only once, but I'm not sure I get your issue. If a user wants to have a QNN model then she will run the QNN pass first anyhow to generate QNN ops in the graph. Under this assumption, if a codegen doesn't want to see QNN ops, it should not have fannotate and AnnotateTarget will annotate QNN ops to other codegens or just default.

  • What's the advantage of this multi-target approach over running the entire partitioning pipeline multiple times?
    I think I understand your concern more after this question. You're saying if a user wants to offload a graph to multiple targets then she will run AnnotateTarget->MergeCompilerRegion->PartitionGraph multiple times. IMHO, this is not an ideal flow due to the following reasons:
  1. If the flow is designed for offloading only one target, it's hard to consider global optimization over multiple targets in the future. For example, 1st run with TargetA creates 2 subgraphs due to restriction set; 2nd run with TargetB creates 1 subgraph (so 3 subgraphs in total). However, assuming the second subgraph for TargetA can also be offloaded to TargetB, meaning that if we run TargetB first than we will get 2 subgraphs in total.
  2. If the flow may take a graph that has been partitioned by another target in advance, all passes in this flow will need to deal with partitioned functions. It's hard to maintain and even harder to integrate global optimization in the above situation.
  3. The position of PartitionGraph in the pass manager is meaningful. We guarantee external functions generated by this pass can be processed by following passes, but not others.

@mbaret
Copy link
Contributor

mbaret commented Apr 8, 2020

To explain the point about QNN a bit better, a concrete example would be our Ethos-N NPU codegen paired up with something like DNNL. The NPU directly supports quantised convolutions and so wants to mark qnn.conv2d as supported. DNNL doesn't supported quantised convolutions but does support normal convolutions, so it wants to see nn.conv2d. Now the TFLite frontend will initially produce a graph containing qnn.conv2d's from a qnn model. If we choose to annotate here, only the Ethos-N codegen will be able to target the graph. If, however, we choose to run QnnCanonicalize, the qnn.conv2d's are lowered to nn.conv2d's but now only DNNL can annotate the graph.

To me the ideal solution here is to be able to run AnnotateTarget multiple times on different lowerings of the graph, because I agree that running the partitioning pipeline multiple times is not a good long term solution (although it probably is a good short term one).

@zhiics
Copy link
Member

zhiics commented Apr 8, 2020

@mbaret The flow you mentioned makes sense to me. I think it is really case by case. Supporting annotation of multiple targets doesn't really prevent users from invoking annoate_target multiple times when needed. However, it gives users a globally view of the graph and provides them an opportunity to optimize the scheduling of ops. Annotating graph mutliple times looks reasonable to me, but we should avoid reverting the partitioned graph and repartitioning it as this could be very error-prone.

@mbaret
Copy link
Contributor

mbaret commented Apr 8, 2020

By moving AnnotateRestDefault's functionality into AnnotateTarget, won't that prevent us from being able to run AnnotateTarget multiple times (because it will annotate defaults multiple times)?

@comaniac
Copy link
Contributor Author

comaniac commented Apr 8, 2020

By moving AnnotateRestDefault's functionality into AnnotateTarget, won't that prevent us from being able to run AnnotateTarget multiple times (because it will annotate defaults multiple times)?

Even the original implementation has the similar issue because it didn't check if an argument is compiler end. I'm making a commit to only modify the annotation attribute for the current target if the annotation has already there.

@mbaret
Copy link
Contributor

mbaret commented Apr 8, 2020

Has there been consideration of the proposal we made with the graph partitioning RFC (multiple independent annotation passes + a deconflict pass)?

@comaniac
Copy link
Contributor Author

comaniac commented Apr 8, 2020

Has there been consideration of the proposal we made with the graph partitioning RFC (multiple independent annotation passes + a deconflict pass)?

I don't think this PR is against the RFC because this PR doesn't really change the flow you implemented, but just make the annotation pass more general by considering multiple targets. On the other hand, as @zhiics mentioned, you can still run the annotation pass multiple times followed by another deconflict pass if you really have to.

@comaniac comaniac force-pushed the re_ext_infra branch 2 times, most recently from de9e85f to c952722 Compare April 8, 2020 20:42
@zhiics
Copy link
Member

zhiics commented Apr 8, 2020

@masahi please also help review this PR. It may have some conflict with your PR.

@masahi
Copy link
Member

masahi commented Apr 8, 2020

It may have some conflict with your PR

Yes there will be some conflict, since I refactored dnnl/codegen.cc quite a bit. But it seems we only need to copy paste the change in this PR to GenerateBody function in my PR, so I think resolving is easy. For me it is fine if this PR is merged first.

Reviewing this PR is a bit difficult for me since I don't know the context of this PR (I don't use either AnnotateTarget or MergeCompilerRegion in my work), and as @mbaret mentioned, there are different kinds of changes (bug fix, improvement etc). I also prefer breaking up into smaller PRs.

Copy link
Contributor

@windclarion windclarion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

composite function itself will not reach the branch, but any other OpNode will reach the branch, and if I only use composite function, I will not register target.xxxx op attr, so OpNode branch will report fail and abort.

src/relay/transforms/annotate_target.cc Show resolved Hide resolved
@comaniac
Copy link
Contributor Author

comaniac commented Apr 9, 2020

composite function itself will not reach the branch, but any other OpNode will reach the branch, and if I only use composite function, I will not register target.xxxx op attr, so OpNode branch will report fail and abort.

Ah I got your point. You're right that check is necessary. I'll add it back. Thanks.

@comaniac
Copy link
Contributor Author

comaniac commented Apr 9, 2020

@masahi thanks for the suggestion. I've reverted the DNNL part and will file another PR for it.
@windclarion checker added.

Copy link
Member

@zhiics zhiics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should bring this PR in since it 1) fixes a bunch of bugs that block several ppl, 2) doesn't really change the flow the current annotation and partitioning flow.

src/relay/analysis/annotated_region_set.cc Outdated Show resolved Hide resolved
src/relay/analysis/annotated_region_set.h Outdated Show resolved Hide resolved
@zhiics
Copy link
Member

zhiics commented Apr 10, 2020

Let's rebase and bring this in and then we can bring @masahi's PR in as well.

@zhiics zhiics merged commit f506c8b into apache:master Apr 10, 2020
@comaniac comaniac deleted the re_ext_infra branch April 10, 2020 21:33
@manupak
Copy link
Contributor

manupak commented Apr 15, 2020

Sorry for the late comments ( I was out for a week ).

Among the bug fixes (e.g., type propogation in tuple bug) and new functionalities (e.g., supporting a list of targets in AnnotateTarget), this PR had technically moved (and improved) AnnotateRestDefault to AnnotateTarget and moved the removal of defaults to PartitionGraph.

IIRC, the defaults were a requirement of the MergeCompilerRegions pass and thats why they were inside MCR ? I just want to understand the rationale as to why you think they should moved as such and potentially exposing them ("defaults") in the graph that is being passed among passes?

PS : I agree with @zhiics that this pass do not change the flow and agree with the most of the fixes that are brought in alongside the moving of "default" related components, though ideally should have been better if they were seperate PRs.

@comaniac
Copy link
Contributor Author

One reason that I exposed default between passes was to have a clear task division so that the flow would be more stable. For example, one bug about composite function reported in #5272 was because AnnotateTarget pass skipped composite functions but AnnotateRestDefault didn't, so the default annotations were mis-inserted and it results in segmentation fault when partitioning the graph. This may happen in the future if still we have more than one places doing similar tasks.

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Apr 16, 2020
)

* add target to region

* refactor annotate_target

* Make all unit test working

* quick fix

* enable BN, unit test failed

* Fix vm test, unit test. Refactor annotate_target a bit.

* quick fix fusion

* revert fusion change

* style fix

* Refactor merge region pass

* format

* minor fix

* Skip e2e test

* lint

* support AnnotateTarget multiple runs

* Add HasAttr and revert DNNL codegen

* address comment

Co-authored-by: Zhi Chen <chzhi@amazon.com>
zhiics added a commit to neo-ai/tvm that referenced this pull request Apr 17, 2020
)

* add target to region

* refactor annotate_target

* Make all unit test working

* quick fix

* enable BN, unit test failed

* Fix vm test, unit test. Refactor annotate_target a bit.

* quick fix fusion

* revert fusion change

* style fix

* Refactor merge region pass

* format

* minor fix

* Skip e2e test

* lint

* support AnnotateTarget multiple runs

* Add HasAttr and revert DNNL codegen

* address comment

Co-authored-by: Zhi Chen <chzhi@amazon.com>
dpankratz pushed a commit to dpankratz/incubator-tvm that referenced this pull request Apr 24, 2020
)

* add target to region

* refactor annotate_target

* Make all unit test working

* quick fix

* enable BN, unit test failed

* Fix vm test, unit test. Refactor annotate_target a bit.

* quick fix fusion

* revert fusion change

* style fix

* Refactor merge region pass

* format

* minor fix

* Skip e2e test

* lint

* support AnnotateTarget multiple runs

* Add HasAttr and revert DNNL codegen

* address comment

Co-authored-by: Zhi Chen <chzhi@amazon.com>
monklof pushed a commit to monklof/incubator-tvm that referenced this pull request Jan 22, 2021
…m_data:master to master

* commit 'cd0d52daa6942bdafa9363ff6cfa3d25fcd5b8d6': (824 commits)
  [Intrinsic] Add log1p, ldexp, atan2, hypot, nextafter, copysign (apache#5312)
  [Rust][CI] Restore Rust CI (apache#5137)
  Remove PrimExpr from String (apache#5311)
  [Requantize] Cleanup and Optimize Lowering (apache#5286)
  [IR][TRANSFORM] Enable CopyOnWrite for passes. (apache#5309)
  [PYTORCH]Abs, Arange, Softplus ops (apache#5295)
  [LLVM] Fix generation of LLVM intrinsics (apache#5282)
  [BYOC] Add example of Composite + Annotate for DNNL fused op (apache#5272)
  [Frontend][TensorFlow]Improve TensorFlow Static Shape Tensor Array (apache#5243)
  [RUNTIME] Introduce RValue reference(move) support to TypedPackedFunc (apache#5271)
  [RELAY][FRONTEND][CAFFE2] add Mul and ConvTranspose operator (apache#5302)
  [BYOC] Refine AnnotateTarget and MergeCompilerRegion Passes (apache#5277)
  [CI] Fix the hexagon string (apache#5304)
  [Arith] linear system and equation solver (apache#5171)
  [PYTORCH]Repeat, Reciprocal & Reshape Op support (apache#5280)
  [FRONTEND][TENSORFLOW] Fix gather_nd indices (apache#5279)
  Update device_annotation.cc (apache#5291)
  [REFACTOR][IR] Move to runtime::String (apache#5276)
  [NDArray] Set shape_ in NDArray::FromDLPack (apache#5301)
  [RUNTIME] Initial implementation of Hexagon runtime support (apache#5252)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants