-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BYOC] Refine AnnotateTarget and MergeCompilerRegion Passes #5277
Conversation
Thanks for this, a few high level comments first.
|
It's hard to separate bug fixes to another PR because some fixes were done with refactoring.
Yes I suppose the flow is intended to run AnnotateTarget only once, but I'm not sure I get your issue. If a user wants to have a QNN model then she will run the QNN pass first anyhow to generate QNN ops in the graph. Under this assumption, if a codegen doesn't want to see QNN ops, it should not have fannotate and AnnotateTarget will annotate QNN ops to other codegens or just default.
|
To explain the point about QNN a bit better, a concrete example would be our Ethos-N NPU codegen paired up with something like DNNL. The NPU directly supports quantised convolutions and so wants to mark qnn.conv2d as supported. DNNL doesn't supported quantised convolutions but does support normal convolutions, so it wants to see nn.conv2d. Now the TFLite frontend will initially produce a graph containing qnn.conv2d's from a qnn model. If we choose to annotate here, only the Ethos-N codegen will be able to target the graph. If, however, we choose to run QnnCanonicalize, the qnn.conv2d's are lowered to nn.conv2d's but now only DNNL can annotate the graph. To me the ideal solution here is to be able to run AnnotateTarget multiple times on different lowerings of the graph, because I agree that running the partitioning pipeline multiple times is not a good long term solution (although it probably is a good short term one). |
@mbaret The flow you mentioned makes sense to me. I think it is really case by case. Supporting annotation of multiple targets doesn't really prevent users from invoking annoate_target multiple times when needed. However, it gives users a globally view of the graph and provides them an opportunity to optimize the scheduling of ops. Annotating graph mutliple times looks reasonable to me, but we should avoid reverting the partitioned graph and repartitioning it as this could be very error-prone. |
By moving AnnotateRestDefault's functionality into AnnotateTarget, won't that prevent us from being able to run AnnotateTarget multiple times (because it will annotate defaults multiple times)? |
Even the original implementation has the similar issue because it didn't check if an argument is compiler end. I'm making a commit to only modify the annotation attribute for the current target if the annotation has already there. |
Has there been consideration of the proposal we made with the graph partitioning RFC (multiple independent annotation passes + a deconflict pass)? |
I don't think this PR is against the RFC because this PR doesn't really change the flow you implemented, but just make the annotation pass more general by considering multiple targets. On the other hand, as @zhiics mentioned, you can still run the annotation pass multiple times followed by another deconflict pass if you really have to. |
de9e85f
to
c952722
Compare
@masahi please also help review this PR. It may have some conflict with your PR. |
Yes there will be some conflict, since I refactored dnnl/codegen.cc quite a bit. But it seems we only need to copy paste the change in this PR to Reviewing this PR is a bit difficult for me since I don't know the context of this PR (I don't use either AnnotateTarget or MergeCompilerRegion in my work), and as @mbaret mentioned, there are different kinds of changes (bug fix, improvement etc). I also prefer breaking up into smaller PRs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
composite function itself will not reach the branch, but any other OpNode will reach the branch, and if I only use composite function, I will not register target.xxxx op attr, so OpNode branch will report fail and abort.
Ah I got your point. You're right that check is necessary. I'll add it back. Thanks. |
@masahi thanks for the suggestion. I've reverted the DNNL part and will file another PR for it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should bring this PR in since it 1) fixes a bunch of bugs that block several ppl, 2) doesn't really change the flow the current annotation and partitioning flow.
Let's rebase and bring this in and then we can bring @masahi's PR in as well. |
Sorry for the late comments ( I was out for a week ). Among the bug fixes (e.g., type propogation in tuple bug) and new functionalities (e.g., supporting a list of targets in AnnotateTarget), this PR had technically moved (and improved) AnnotateRestDefault to AnnotateTarget and moved the removal of defaults to PartitionGraph. IIRC, the defaults were a requirement of the MergeCompilerRegions pass and thats why they were inside MCR ? I just want to understand the rationale as to why you think they should moved as such and potentially exposing them ("defaults") in the graph that is being passed among passes? PS : I agree with @zhiics that this pass do not change the flow and agree with the most of the fixes that are brought in alongside the moving of "default" related components, though ideally should have been better if they were seperate PRs. |
One reason that I exposed default between passes was to have a clear task division so that the flow would be more stable. For example, one bug about composite function reported in #5272 was because AnnotateTarget pass skipped composite functions but AnnotateRestDefault didn't, so the default annotations were mis-inserted and it results in segmentation fault when partitioning the graph. This may happen in the future if still we have more than one places doing similar tasks. |
) * add target to region * refactor annotate_target * Make all unit test working * quick fix * enable BN, unit test failed * Fix vm test, unit test. Refactor annotate_target a bit. * quick fix fusion * revert fusion change * style fix * Refactor merge region pass * format * minor fix * Skip e2e test * lint * support AnnotateTarget multiple runs * Add HasAttr and revert DNNL codegen * address comment Co-authored-by: Zhi Chen <chzhi@amazon.com>
) * add target to region * refactor annotate_target * Make all unit test working * quick fix * enable BN, unit test failed * Fix vm test, unit test. Refactor annotate_target a bit. * quick fix fusion * revert fusion change * style fix * Refactor merge region pass * format * minor fix * Skip e2e test * lint * support AnnotateTarget multiple runs * Add HasAttr and revert DNNL codegen * address comment Co-authored-by: Zhi Chen <chzhi@amazon.com>
) * add target to region * refactor annotate_target * Make all unit test working * quick fix * enable BN, unit test failed * Fix vm test, unit test. Refactor annotate_target a bit. * quick fix fusion * revert fusion change * style fix * Refactor merge region pass * format * minor fix * Skip e2e test * lint * support AnnotateTarget multiple runs * Add HasAttr and revert DNNL codegen * address comment Co-authored-by: Zhi Chen <chzhi@amazon.com>
…m_data:master to master * commit 'cd0d52daa6942bdafa9363ff6cfa3d25fcd5b8d6': (824 commits) [Intrinsic] Add log1p, ldexp, atan2, hypot, nextafter, copysign (apache#5312) [Rust][CI] Restore Rust CI (apache#5137) Remove PrimExpr from String (apache#5311) [Requantize] Cleanup and Optimize Lowering (apache#5286) [IR][TRANSFORM] Enable CopyOnWrite for passes. (apache#5309) [PYTORCH]Abs, Arange, Softplus ops (apache#5295) [LLVM] Fix generation of LLVM intrinsics (apache#5282) [BYOC] Add example of Composite + Annotate for DNNL fused op (apache#5272) [Frontend][TensorFlow]Improve TensorFlow Static Shape Tensor Array (apache#5243) [RUNTIME] Introduce RValue reference(move) support to TypedPackedFunc (apache#5271) [RELAY][FRONTEND][CAFFE2] add Mul and ConvTranspose operator (apache#5302) [BYOC] Refine AnnotateTarget and MergeCompilerRegion Passes (apache#5277) [CI] Fix the hexagon string (apache#5304) [Arith] linear system and equation solver (apache#5171) [PYTORCH]Repeat, Reciprocal & Reshape Op support (apache#5280) [FRONTEND][TENSORFLOW] Fix gather_nd indices (apache#5279) Update device_annotation.cc (apache#5291) [REFACTOR][IR] Move to runtime::String (apache#5276) [NDArray] Set shape_ in NDArray::FromDLPack (apache#5301) [RUNTIME] Initial implementation of Hexagon runtime support (apache#5252) ...
Since we are working on TRT support with BYOC infra, we added some features and fixed some bugs in AnnotateTarget and MergeCompilerRegion passes. Here are the change logs:
Refactor 1: Both AnnotateTarget and MergeCompilerRegion passes output a graph with all ops annotated. Ops that will be handled by TVM will also be annotated with target "default".
Refactor 2: Support multiple runs for AnnotateTarget. Users may run AnnotateTarget multiple times because different targets may need the Relay graph at different stage (e.g., before or after QnnCanonicalize pass). This refactor makes sure AnnotateTarget pass can handle a Relay graph that has been annotated.
Refactor 3: Support Multiple Targets in AnnotateTarget pass. In case more than one targets can take the same Relay graph, we intend to run AnnotateTarget once to annotate all of them to open global optimization opportunities for future developments.
cc @zhiics, @trevor-m, @mbaret, @manupa-arm