-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TVMC][microNPU] tvmc option for printing which operators are offloaded to Ethos-U #13212
[TVMC][microNPU] tvmc option for printing which operators are offloaded to Ethos-U #13212
Conversation
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.
Generated by tvm-bot |
@leandron could you please check it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sergey-grovety, this looks like a very helpful addition for users to see how their model is partitioned! I took a quick look and had a couple of high-level questions.
The option for printing the operators currently seems very specific to the NPU, I'm wondering if we would see more benefit adding this as a generic option within TVMC without too many changes? Not only would it benefit other targets, it would make the option more robust and easier to find from a user POV. Its currently possible to save the partitioned graph in TVMC using --dump-code="relay"
, perhaps print_operators_offloading
could be called at a similar point (given a command line argument such as --dump-offloads
) rather than from within the NPU specific code, WDYT?
I'm also wondering how much information a user would be able to understand from the Relay output if they're unfamiliar with it. For example, if there was a TFLite graph consisting of a single CONV2D operation, it seems like the current output would display 4 operations being offloaded to the NPU (qnn.conv2d -> bias_add -> requantize -> clip), which might be a bit confusing for a non-experienced user. Linking back to the original TFLite operation might be tricky, but we have the NPU composite operations that have a similar relationship. Perhaps we could display this like below with indentation indicating Relay operations that make up the composite operation? Happy to hear other suggestions though :)
ethos-u <- ethos-u.qnn_conv2d
ethos-u <- %1 = nn.bias_add(%0, %v_param_2, axis=3);
ethos-u <- %2 = qnn.requantize(%1, meta[relay.Constant][1], 0, 0.0235294f, -128, axis=3, out_dtype="int8");
ethos-u <- %3 = clip(%2, a_min=-128f, a_max=127f);
Also cc @ekalda, @ashutosh-arm who may be interested
I agree with @lhutton1 here. The knob |
Hi @ashutosh-arm sorry for my late reply. As I see it, the main purpose of the new option is to show the correspondence between the operators from the original graph and the final operations offloading on the target. This is displayed as a sequential printout of the source relay's operations, with the composites from which they are derived and the target to which they are unloaded. Another point worth highlighting is the partitioned Relay, which is an output of --dump-code="relay", have relay operation's numbers ( %...) different from those in the initial Relay. Therefore, the new knob, which keeps the initial Relay's numbers, can be handy Here is an example output with the new option:
|
Hi @lhutton1, Do you propose to implement this function for all the targets? Or just add a general compiler option leaving the implementation currently only in the ethos-u backend? Here is an example how the output would look like if model is compiled for the target "llvm":
And for targets "ethos-u,cmsis-nn,c"
|
I see your point. Here are some points to consider:
|
Can you please clarify what do you mean by "the output from TVM"? |
Sorry for wrong wording. With this knob, TVM will produce a new debug output which would set an example for other backends. So, my suggestion was to discuss this upfront on the discuss forum. |
Thanks for the explanation @arina-grovety, yes I was thinking other backends like CMSIS-NN could make use of the same approach since the I see @ashutosh-arm's point that this feature intersects with the work in #13116, perhaps it would be useful to have a discussion with the authors to align on expectations. After a quick look I don’t believe this work takes into account offloaded operations, but I could be wrong. Just like #13116 it would be great to see the
@ashutosh-arm, I think this is okay as operators will still be wrapped in their respective 'composite' function where the "ethos-u.qnn_conv2d" name is stored. Supporting other methods of operator offloading such as https://github.com/apache/tvm/blob/main/python/tvm/relay/op/contrib/ethosn.py#L451 I feel are out of scope for this work for the time being. I agree though that we should get the community opinion on this before making such a change, as you rightly mention the alternative is to educate the user how to read the partitioned Relay graph in the tutorial. |
623aa1f
to
e9e8a68
Compare
Hello @lhutton1, we have pushed update to the PR, option "--target-ethos-u-dump_npu_functions_coverage" has been replaced by more generic "--dump-offloads" with the same meaning. |
Thanks for the updates @sergey-grovety @arina-grovety, looks great! I started reviewing this evening but didn't fully get through it, will pick up where I left off tomorrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the delay, I left some comments below, see what you think. Thanks for updates, its looking much better!
python/tvm/driver/tvmc/compiler.py
Outdated
---------- | ||
mod : tvm.ir.IRModule | ||
The IRModule that gets generated from a relay frontend. | ||
initial_relay_astext : list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I missed it, what's the reason for parsing the initial Relay as a string, rather than traversing a copy of the IRModule?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @lhutton1,
this was done to avoid copying entities that we consider unnecessary, since only the text representation of the Relay is used in this function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that traversing the Relay IR itself here might simplify the logic below and decouple the implementation from the textual representation making it more robust to changes in the future. It seems like it would also remove the need to make changes such as https://github.com/apache/tvm/pull/13212/files#diff-237c52e4e68362990738b47cc97c81b5c84ec92dfbcb672e961f0e9887f436c0R378 which might require more motivation from the community. WDYT?
cc @ashutosh-arm @ekalda incase you have any other suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest the same thing as @lhutton1 did above. Text representation changes quite often. It is better to rely on the information available inside the module object and extract it using let's say ExprVisitor
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @ashutosh-arm,
sorry, there is a non-fixed comment,
now we pass the initial Relay as Relay IR itself, and then use "annotate" parameter of the astext() function
to add the desired annotations to the generated text, and then parsing our annotations from the formed text.
I will fix the comment string in the update to the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies if there might have been some confusion here, this question was more around the need to search over the relay IR as text for the compiler name, op name, func id, etc. The information could be extracted using a visitor pass (ExprVisitor
) that traverses the IR, making it more resilient to changes in the text format of the IR. Since this method is working and to move this forwards, we can pull this out into a separate follow-up
44de505
to
325b2ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the delay and thanks for the reminder. I think it's getting close, thanks for the hard work on this. I think my biggest concern still lies in extracting relevant information from the textual representation of the relay as it seems a bit more fragile. Is there a reason for doing it like this? Otherwise LGTM!
tests/python/contrib/test_ethosu/test_pass_operations_distribution.py
Outdated
Show resolved
Hide resolved
68ede35
to
d603dc9
Compare
…pan field of Call.
d603dc9
to
afe244b
Compare
@tvm-bot rerun |
tests/python/contrib/test_ethosu/test_pass_operations_distribution.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Span suffix part looks good to me. Thanks for help. :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updates on this @arina-grovety, @sergio-grovety, I just had one nit which I think was missed previously, otherwise LGTM. Thanks for the support with the spans @chunit-quic!
…perations_distribution.py. Fix tflite import in tests/python/contrib/test_ethosu/infra.py
@tvm-bot rerun |
1 similar comment
@tvm-bot rerun |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thanks @sergio-grovety @arina-grovety @chunit-quic @ashutosh-arm! This will be very helpful for users wanting to see how their models were offloaded, thanks for persisting with all the changes! |
Added an option to tvmc and Ethos-U for printing to console or to the file which operators from the initial graph are offloaded to Ethos-U and which aren't. It forms line-by-line output of initial model IR, indicating which operations ported to Ethos-U.
Compiler option "--target-ethos-u-dump_npu_functions_coverage" has been replaced by more generic "--dump-offloads" with the same meaning.
Usage
Example output:
...
Total number of operators and distribution by targets
Total: 211
target1: 198
target2: 10
generic: 3
'target1 <- target2.qnn_conv2d'
'target1 <- %0 = qnn.conv2d(%tfl.quantize, %v_param_1, ...'
'target1 <- %1 = nn.bias_add(%0, %v_param_2, axis=3);'
'target1 <- %2 = qnn.requantize(%1, meta[relay.Constant]...'
'target2 <- target2.reshape'
'target2 <- %3 = reshape(%2, newshape=[1, 1001]);'
'generic <- %4 = nn.pad(%3, -128f, pad_width=[[0, 0], [1, 1]...'
...
Usage
Example output:
...
ethos-u <- %1 = nn.bias_add(%0, %v_param_2, axis=3);
ethos-u <- %2 = qnn.requantize(%1, meta[relay.Constant][1], 0, 0.0235294f, -128, axis=3, out_dtype="int8");
ethos-u <- %3 = clip(%2, a_min=-128f, a_max=127f);
....