Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TVMC][microNPU] tvmc option for printing which operators are offloaded to Ethos-U #13212

Conversation

sergio-grovety
Copy link
Contributor

@sergio-grovety sergio-grovety commented Oct 27, 2022

Added an option to tvmc and Ethos-U for printing to console or to the file which operators from the initial graph are offloaded to Ethos-U and which aren't. It forms line-by-line output of initial model IR, indicating which operations ported to Ethos-U.

Compiler option "--target-ethos-u-dump_npu_functions_coverage" has been replaced by more generic "--dump-offloads" with the same meaning.

Usage

# output to console:
tvmc compile --target=ethos-u,cmsis-nn,c \
    --dump-offloads=- \
    ........

# output to file:
tvmc compile --target=ethos-u,cmsis-nn,c \
    --dump-offloads=<file path> \
    ........

Example output:

...
Total number of operators and distribution by targets
Total: 211
target1: 198
target2: 10
generic: 3

'target1 <- target2.qnn_conv2d'
'target1 <- %0 = qnn.conv2d(%tfl.quantize, %v_param_1, ...'
'target1 <- %1 = nn.bias_add(%0, %v_param_2, axis=3);'
'target1 <- %2 = qnn.requantize(%1, meta[relay.Constant]...'
'target2 <- target2.reshape'
'target2 <- %3 = reshape(%2, newshape=[1, 1001]);'
'generic <- %4 = nn.pad(%3, -128f, pad_width=[[0, 0], [1, 1]...'
...

Usage

# output to console:
tvmc compile --target=ethos-u,cmsis-nn,c \
    --target-ethos-u-dump_npu_functions_coverage=- \
    ........

# output to file:
tvmc compile --target=ethos-u,cmsis-nn,c \
    --target-ethos-u-dump_npu_functions_coverage=<file path> \
    ........

Example output:

...
ethos-u <- %1 = nn.bias_add(%0, %v_param_2, axis=3);
ethos-u <- %2 = qnn.requantize(%1, meta[relay.Constant][1], 0, 0.0235294f, -128, axis=3, out_dtype="int8");
ethos-u <- %3 = clip(%2, a_min=-128f, a_max=127f);
....

@tvm-bot
Copy link
Collaborator

tvm-bot commented Oct 27, 2022

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

@sergio-grovety
Copy link
Contributor Author

@leandron could you please check it?

Copy link
Contributor

@lhutton1 lhutton1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sergey-grovety, this looks like a very helpful addition for users to see how their model is partitioned! I took a quick look and had a couple of high-level questions.

The option for printing the operators currently seems very specific to the NPU, I'm wondering if we would see more benefit adding this as a generic option within TVMC without too many changes? Not only would it benefit other targets, it would make the option more robust and easier to find from a user POV. Its currently possible to save the partitioned graph in TVMC using --dump-code="relay", perhaps print_operators_offloading could be called at a similar point (given a command line argument such as --dump-offloads) rather than from within the NPU specific code, WDYT?

I'm also wondering how much information a user would be able to understand from the Relay output if they're unfamiliar with it. For example, if there was a TFLite graph consisting of a single CONV2D operation, it seems like the current output would display 4 operations being offloaded to the NPU (qnn.conv2d -> bias_add -> requantize -> clip), which might be a bit confusing for a non-experienced user. Linking back to the original TFLite operation might be tricky, but we have the NPU composite operations that have a similar relationship. Perhaps we could display this like below with indentation indicating Relay operations that make up the composite operation? Happy to hear other suggestions though :)

ethos-u    <-   ethos-u.qnn_conv2d
ethos-u    <-       %1 = nn.bias_add(%0, %v_param_2, axis=3);
ethos-u    <-       %2 = qnn.requantize(%1, meta[relay.Constant][1], 0, 0.0235294f, -128, axis=3, out_dtype="int8");
ethos-u    <-       %3 = clip(%2, a_min=-128f, a_max=127f);

Also cc @ekalda, @ashutosh-arm who may be interested

src/relay/backend/contrib/ethosu/compiler_attrs.cc Outdated Show resolved Hide resolved
@ashutosh-arm
Copy link
Contributor

ashutosh-arm commented Nov 2, 2022

I agree with @lhutton1 here. The knob --dump-code="relay" provides a way to visualize the post-partition relay model. main function in this relay model lists sequence of calls to partitioned functions with appropriate target annotations. Does the new knob print_operators_offloading serve any additional purpose that I might have missed @sergey-grovety ? To be fair, I have only read the PR description 😅

@arina-grovety
Copy link
Contributor

I agree with @lhutton1 here. The knob --dump-code="relay" provides a way to visualize the post-partition relay model. main function in this relay model lists sequence of calls to partitioned functions with appropriate target annotations. Does the new knob print_operators_offloading serve any additional purpose that I might have missed @sergey-grovety ? To be fair, I have only read the PR description 😅

Hi @ashutosh-arm sorry for my late reply. As I see it, the main purpose of the new option is to show the correspondence between the operators from the original graph and the final operations offloading on the target. This is displayed as a sequential printout of the source relay's operations, with the composites from which they are derived and the target to which they are unloaded.

Another point worth highlighting is the partitioned Relay, which is an output of --dump-code="relay", have relay operation's numbers ( %...) different from those in the initial Relay. Therefore, the new knob, which keeps the initial Relay's numbers, can be handy

Here is an example output with the new option:

'ethos-u    <- ethos-u.qnn_conv2d'
'ethos-u    <-        %204 = qnn.conv2d(%203, %v_param_105, -128, 0, 0.0235294f, ...'
'ethos-u    <-        %205 = nn.bias_add(%204, %v_param_106, axis=3);'
'ethos-u    <-        %206 = qnn.requantize(%205, meta[relay.Constant][105], 0, ...'
'ethos-u    <- ethos-u.reshape'
'ethos-u    <-        %207 = reshape(%206, newshape=[1, 1001]);'

@arina-grovety
Copy link
Contributor

arina-grovety commented Nov 9, 2022

The option for printing the operators currently seems very specific to the NPU, I'm wondering if we would see more benefit adding this as a generic option within TVMC without too many changes? Not only would it benefit other targets, it would make the option more robust and easier to find from a user POV. Its currently possible to save the partitioned graph in TVMC using --dump-code="relay", perhaps print_operators_offloading could be called at a similar point (given a command line argument such as --dump-offloads) rather than from within the NPU specific code, WDYT?

Hi @lhutton1,

Do you propose to implement this function for all the targets? Or just add a general compiler option leaving the implementation currently only in the ethos-u backend?
Right now, this function is specific to ethos-u and is handled in the ethos-u backend.
As far as I'm concerned it won't be a problem to implement the function for all targets, but of course I could be wrong.

Here is an example how the output would look like if model is compiled for the target "llvm":

   'generic    <-   %0 = qnn.conv2d(%tfl.quantize, %v_param_1, ...'
   'generic    <-   %1 = nn.bias_add(%0, %v_param_2, axis=3);'
   'generic    <-   %2 = qnn.requantize(%1, meta[relay.Constant]...'

And for targets "ethos-u,cmsis-nn,c"

    'ethos-u    <- ethos-u.qnn_conv2d'
    'ethos-u    <-        %204 = qnn.conv2d(%203, %v_param_105, -128, 0, 0.0235294f, ...'
    'ethos-u    <-        %205 = nn.bias_add(%204, %v_param_106, axis=3);'
    'ethos-u    <-        %206 = qnn.requantize(%205, meta[relay.Constant][105], 0, ...'
    'ethos-u    <- ethos-u.reshape'
    'ethos-u    <-        %207 = reshape(%206, newshape=[1, 1001]);'
    'cmsis-nn   <- cmsis-nn.qnn_softmax'
    'cmsis-nn   <-        %208 = qnn.dequantize(%207, 0.0775722f, -61);'
    'cmsis-nn   <-        %209 = nn.softmax(%208);'
    'cmsis-nn   <-        qnn.quantize(%209, 0.00390625f, -128, out_dtype="int8")'

@ashutosh-arm
Copy link
Contributor

Hi @ashutosh-arm sorry for my late reply. As I see it, the main purpose of the new option is to show the correspondence between the operators from the original graph and the final operations offloading on the target. This is displayed as a sequential printout of the source relay's operations, with the composites from which they are derived and the target to which they are unloaded.

Another point worth highlighting is the partitioned Relay, which is an output of --dump-code="relay", have relay operation's numbers ( %...) different from those in the initial Relay. Therefore, the new knob, which keeps the initial Relay's numbers, can be handy

Here is an example output with the new option:

'ethos-u    <- ethos-u.qnn_conv2d'

I see your point. Here are some points to consider:

  1. print_operators_offloading would print the mapping of MicroNPU operators to the original Relay operators. Each target that uses partitioning has its own ways of defining these mappings. Some of them make use of MergeCompilerRegions that clubs multiple operators into a single partitioned function. Would it be possible to support additional targets given the knob name implies generic support in TVM?
  2. Another thing in pipe for better debug from frontends down to TIR is Compiler Explorer: [Tracking Issue] TVM Explorer Infrastructure #13116. At some point in future, this should provide more transparency to the operator mappings.
  3. In the mean time, with a little education to the end user of MicroNPU, learning to interpret partitioned functions could be made easy. Just a suggestion, this knowledge could be shared via TVM docs and/or MicroNPU demo.
  4. Another reason for thinking twice is that this knob changes the output from TVM. We should seek more opinions from community maybe?

@arina-grovety
Copy link
Contributor

  1. Another reason for thinking twice is that this knob changes the output from TVM. We should seek more opinions from community maybe?

Can you please clarify what do you mean by "the output from TVM"?

@ashutosh-arm
Copy link
Contributor

ashutosh-arm commented Nov 11, 2022

Can you please clarify what do you mean by "the output from TVM"?

Sorry for wrong wording. With this knob, TVM will produce a new debug output which would set an example for other backends. So, my suggestion was to discuss this upfront on the discuss forum.

@lhutton1
Copy link
Contributor

lhutton1 commented Nov 11, 2022

The option for printing the operators currently seems very specific to the NPU, I'm wondering if we would see more benefit adding this as a generic option within TVMC without too many changes? Not only would it benefit other targets, it would make the option more robust and easier to find from a user POV. Its currently possible to save the partitioned graph in TVMC using --dump-code="relay", perhaps print_operators_offloading could be called at a similar point (given a command line argument such as --dump-offloads) rather than from within the NPU specific code, WDYT?

Hi @lhutton1,

Do you propose to implement this function for all the targets? Or just add a general compiler option leaving the implementation currently only in the ethos-u backend? Right now, this function is specific to ethos-u and is handled in the ethos-u backend. As far as I'm concerned it won't be a problem to implement the function for all targets, but of course I could be wrong.

Here is an example how the output would look like if model is compiled for the target "llvm":

   'generic    <-   %0 = qnn.conv2d(%tfl.quantize, %v_param_1, ...'
   'generic    <-   %1 = nn.bias_add(%0, %v_param_2, axis=3);'
   'generic    <-   %2 = qnn.requantize(%1, meta[relay.Constant]...'

And for targets "ethos-u,cmsis-nn,c"

    'ethos-u    <- ethos-u.qnn_conv2d'
    'ethos-u    <-        %204 = qnn.conv2d(%203, %v_param_105, -128, 0, 0.0235294f, ...'
    'ethos-u    <-        %205 = nn.bias_add(%204, %v_param_106, axis=3);'
    'ethos-u    <-        %206 = qnn.requantize(%205, meta[relay.Constant][105], 0, ...'
    'ethos-u    <- ethos-u.reshape'
    'ethos-u    <-        %207 = reshape(%206, newshape=[1, 1001]);'
    'cmsis-nn   <- cmsis-nn.qnn_softmax'
    'cmsis-nn   <-        %208 = qnn.dequantize(%207, 0.0775722f, -61);'
    'cmsis-nn   <-        %209 = nn.softmax(%208);'
    'cmsis-nn   <-        qnn.quantize(%209, 0.00390625f, -128, out_dtype="int8")'

Thanks for the explanation @arina-grovety, yes I was thinking other backends like CMSIS-NN could make use of the same approach since the AnalyzeOperationsDistribution already seems quite generic. Where possible we can add the composite function names and if they are not found we can fallback to just printing the Relay - exactly as you described. If another backend has a different method of offloading operations this could simply be added to the pass in the future as and when needed.

I see @ashutosh-arm's point that this feature intersects with the work in #13116, perhaps it would be useful to have a discussion with the authors to align on expectations. After a quick look I don’t believe this work takes into account offloaded operations, but I could be wrong. Just like #13116 it would be great to see the ethos-u.qnn_conv2d annotations relate back to the input graph format (e.g. in TFLite: CONV2D) to make it easy for the user to relate their compiled operations to their original graph, but this seems a bit involved for now.

  1. print_operators_offloading would print the mapping of MicroNPU operators to the original Relay operators. Each target that uses partitioning has its own ways of defining these mappings. Some of them make use of MergeCompilerRegions that clubs multiple operators into a single partitioned function. Would it be possible to support additional targets given the knob name implies generic support in TVM?

@ashutosh-arm, I think this is okay as operators will still be wrapped in their respective 'composite' function where the "ethos-u.qnn_conv2d" name is stored. Supporting other methods of operator offloading such as https://github.com/apache/tvm/blob/main/python/tvm/relay/op/contrib/ethosn.py#L451 I feel are out of scope for this work for the time being. I agree though that we should get the community opinion on this before making such a change, as you rightly mention the alternative is to educate the user how to read the partitioned Relay graph in the tutorial.

@sergio-grovety sergio-grovety force-pushed the tvmc-ethosu-dump-npu-functions-coverage-option branch from 623aa1f to e9e8a68 Compare November 28, 2022 14:23
@arina-grovety
Copy link
Contributor

Thanks for the explanation @arina-grovety, yes I was thinking other backends like CMSIS-NN could make use of the same approach since the AnalyzeOperationsDistribution already seems quite generic. Where possible we can add the composite function names and if they are not found we can fallback to just printing the Relay - exactly as you described. If another backend has a different method of offloading operations this could simply be added to the pass in the future as and when needed.

Hello @lhutton1, we have pushed update to the PR, option "--target-ethos-u-dump_npu_functions_coverage" has been replaced by more generic "--dump-offloads" with the same meaning.

@lhutton1
Copy link
Contributor

Thanks for the updates @sergey-grovety @arina-grovety, looks great! I started reviewing this evening but didn't fully get through it, will pick up where I left off tomorrow

Copy link
Contributor

@lhutton1 lhutton1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay, I left some comments below, see what you think. Thanks for updates, its looking much better!

python/tvm/driver/tvmc/compiler.py Outdated Show resolved Hide resolved
python/tvm/driver/tvmc/compiler.py Outdated Show resolved Hide resolved
python/tvm/driver/tvmc/compiler.py Outdated Show resolved Hide resolved
python/tvm/relay/analysis/operations_distribution.py Outdated Show resolved Hide resolved
python/tvm/relay/frontend/common.py Outdated Show resolved Hide resolved
----------
mod : tvm.ir.IRModule
The IRModule that gets generated from a relay frontend.
initial_relay_astext : list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I missed it, what's the reason for parsing the initial Relay as a string, rather than traversing a copy of the IRModule?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @lhutton1,
this was done to avoid copying entities that we consider unnecessary, since only the text representation of the Relay is used in this function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that traversing the Relay IR itself here might simplify the logic below and decouple the implementation from the textual representation making it more robust to changes in the future. It seems like it would also remove the need to make changes such as https://github.com/apache/tvm/pull/13212/files#diff-237c52e4e68362990738b47cc97c81b5c84ec92dfbcb672e961f0e9887f436c0R378 which might require more motivation from the community. WDYT?

cc @ashutosh-arm @ekalda incase you have any other suggestions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest the same thing as @lhutton1 did above. Text representation changes quite often. It is better to rely on the information available inside the module object and extract it using let's say ExprVisitor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @ashutosh-arm,
sorry, there is a non-fixed comment,

now we pass the initial Relay as Relay IR itself, and then use "annotate" parameter of the astext() function
to add the desired annotations to the generated text, and then parsing our annotations from the formed text.

I will fix the comment string in the update to the PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies if there might have been some confusion here, this question was more around the need to search over the relay IR as text for the compiler name, op name, func id, etc. The information could be extracted using a visitor pass (ExprVisitor) that traverses the IR, making it more resilient to changes in the text format of the IR. Since this method is working and to move this forwards, we can pull this out into a separate follow-up

tests/python/driver/tvmc/test_compiler.py Outdated Show resolved Hide resolved
@sergio-grovety sergio-grovety force-pushed the tvmc-ethosu-dump-npu-functions-coverage-option branch 4 times, most recently from 44de505 to 325b2ef Compare January 23, 2023 06:34
Copy link
Contributor

@lhutton1 lhutton1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay and thanks for the reminder. I think it's getting close, thanks for the hard work on this. I think my biggest concern still lies in extracting relevant information from the textual representation of the relay as it seems a bit more fragile. Is there a reason for doing it like this? Otherwise LGTM!

python/tvm/driver/tvmc/compiler.py Outdated Show resolved Hide resolved
python/tvm/relay/frontend/common.py Outdated Show resolved Hide resolved
@sergio-grovety sergio-grovety force-pushed the tvmc-ethosu-dump-npu-functions-coverage-option branch 2 times, most recently from 68ede35 to d603dc9 Compare March 17, 2023 14:20
@sergio-grovety sergio-grovety force-pushed the tvmc-ethosu-dump-npu-functions-coverage-option branch from d603dc9 to afe244b Compare March 19, 2023 18:55
@sergio-grovety
Copy link
Contributor Author

@tvm-bot rerun

@sergio-grovety sergio-grovety requested review from chunit-quic and lhutton1 and removed request for chunit-quic and lhutton1 March 20, 2023 13:27
Copy link
Contributor

@chunit-quic chunit-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Span suffix part looks good to me. Thanks for help. :D

Copy link
Contributor

@lhutton1 lhutton1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updates on this @arina-grovety, @sergio-grovety, I just had one nit which I think was missed previously, otherwise LGTM. Thanks for the support with the spans @chunit-quic!

…perations_distribution.py. Fix tflite import in tests/python/contrib/test_ethosu/infra.py
@arina-grovety
Copy link
Contributor

@tvm-bot rerun

1 similar comment
@lhutton1
Copy link
Contributor

@tvm-bot rerun

@arina-grovety
Copy link
Contributor

@tvm-bot rerun

Hello @lhutton1, thank you!

@sergio-grovety sergio-grovety requested review from lhutton1 and removed request for lhutton1 March 27, 2023 13:01
Copy link
Contributor

@lhutton1 lhutton1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@lhutton1 lhutton1 merged commit da83353 into apache:main Mar 27, 2023
@lhutton1
Copy link
Contributor

Thanks @sergio-grovety @arina-grovety @chunit-quic @ashutosh-arm! This will be very helpful for users wanting to see how their models were offloaded, thanks for persisting with all the changes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants