[TVMC][TRANSFORMS] ToMixedPrecision transform support with custom options enabled #14010

srkreddy1238 · 2023-02-16T12:08:56Z

Adds new command line options:

--mixed-precision - Enable mixed precision conversion
--mixed-precision-ops - List of operators to be converted to mixed precision
--mixed-precision-calculation-type - Calculation precision type
--mixed-precision-acc-type - Accumulator precision type

And:

--desired-layout-ops - The list of operators to be transformed with desired layout.

tvm-bot · 2023-02-16T12:09:00Z

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

cc @Mousius, @gromero, @leandron, @lhutton1 _{See #10317 for details}

_{Generated by tvm-bot}

leandron

Thanks @srkreddy1238 this looks very interesting. Can you improve the PR description with an example command on how it looks like?

Can you also add a few tests to make sure this don't regress in future? Thanks.

Mousius · 2023-02-16T13:24:36Z

@srkreddy1238 is there a reason you're not doing this via something like RelayToTIR or similar? it seems you could create a Target hook which would do this whenever it encounters specific parameters rather than having to manually add it to the tvmc command line?

python/tvm/driver/tvmc/compiler.py

python/tvm/driver/tvmc/composite_target.py

srkreddy1238 · 2023-02-16T15:40:28Z

@srkreddy1238 is there a reason you're not doing this via something like RelayToTIR or similar? it seems you could create a Target hook which would do this whenever it encounters specific parameters rather than having to manually add it to the tvmc command line?

@Mousius thanks for the review.

Initially I thought of this approach where I can create an external codegen and extend the REGISTERED_CODEGEN as shown below.

tvm/python/tvm/driver/tvmc/composite_target.py

Line 50 in e7ad4bc

REGISTERED_CODEGEN = {

This way tvmc calls the pass_pipeline and we can do what ever we want over the module.

These hooks are essentially small personalization over existing transforms. Like in this example we tried to customize the Mixed Precision pass for Adreno. Essentially we are helping developers with these simple reusable utilities. These hooks are optional features on the existing target and they may grow over time. Creating multiple targets for each such feature results in multiple similar targets definitions.

I thought hooks may be a better approach to keep them away from the existing real external codegens.

What do you think ?

Mousius · 2023-02-16T16:43:30Z

@srkreddy1238 is there a reason you're not doing this via something like RelayToTIR or similar? it seems you could create a Target hook which would do this whenever it encounters specific parameters rather than having to manually add it to the tvmc command line?

@Mousius thanks for the review.

Initially I thought of this approach where I can create an external codegen and extend the REGISTERED_CODEGEN as shown below.

tvm/python/tvm/driver/tvmc/composite_target.py

Line 50 in e7ad4bc

REGISTERED_CODEGEN = {

This way tvmc calls the pass_pipeline and we can do what ever we want over the module.
These hooks are essentially small personalization over existing transforms. Like in this example we tried to customize the Mixed Precision pass for Adreno. Essentially we are helping developers with these simple reusable utilities. These hooks are optional features on the existing target and they may grow over time. Creating multiple targets for each such feature results in multiple similar targets definitions.

I thought hooks may be a better approach to keep them away from the existing real external codegens.

What do you think ?

I think there's two different thoughts:

Rather than wrapping this in tvmc, it'd be good to use Target hooks and process the Target within the hook to decide any additional lowering passes, there's also pre-processing logic which you can apply - I gather this is harder as it's going through some GPU target? I'm a bit nervous about creating a whole hook registry for this as it seems like we should be pushing that into the Target rather than working around it in tvmc- from a users point of view it should be something like:

tvmc --target=opencl --target-opencl-mixed-precision

Else, do you agree it's a bit odd to have to essentially build your ML pipeline in CLI arguments?

The other thing is that you mentioned this being reusable utility, I like that approach more, and I'm curious whether we can add a --mixed-precision option so as to benefit all Targets?

srkreddy1238 · 2023-02-16T17:34:10Z

Adreno currently share OpenCL target, we probably should consider defining an exclusive Target for Adreno to support customizations.

I like the idea of tvmc directly supporting --mixed-precision kind of options for these target independent preprocess. The target dependents definitely can go to target options. Current requirement of mixed-precision can definitely be used across all targets.

Not sure about tvmc accommodating preprocessing needs that arise from vendors in future which are not critical for Targets but developer friendly.

Coming to post build hooks, the intension here is to accommodate altering the compiled library module. One of the use case being importing additional modules before export/save. Any thoughts here ?

srkreddy1238 · 2023-02-17T04:55:21Z

@Mousius

Another thought here, How about using packed functions instead of inventory of hooks ?

Hook is defined in user application or the respective contrib as

@tvm.register_func("adreno.mixed_precision_fp16")
def mixed_precision_fp16(mod, params, args):
      . . . . . .

     return mod

Invoked as --pre-build-hooks "adreno.mixed_precision_fp16" and tvmc looks for this packed function and invokes it.

This can be good starting point where tvmc can be out of vendor specific hooks and going forward when we see any API's used across many targets we could bring them into tvmc infra.

Mousius · 2023-02-20T14:46:35Z

Adreno currently share OpenCL target, we probably should consider defining an exclusive Target for Adreno to support customizations.

I like the idea of tvmc directly supporting --mixed-precision kind of options for these target independent preprocess. The target dependents definitely can go to target options. Current requirement of mixed-precision can definitely be used across all targets.

Not sure about tvmc accommodating preprocessing needs that arise from vendors in future which are not critical for Targets but developer friendly.

The advice I'd give here is that is that we don't always need a generic solution to a specific problem, the --mixed-precision parameter provides benefits to other Targets and doesn't require us creating a new plugin system within tvmc - we have similar functionality in the --alter-layout parameter.

Coming to post build hooks, the intension here is to accommodate altering the compiled library module. One of the use case being importing additional modules before export/save. Any thoughts here ?

What is the use-case for this? I can't think of a use-case for generically bundling things into the eventual output unless I'm missing something, specific Targets would likely benefit from it, and they'd better understand the shape of their own dependencies which indicates it should be done in some Target hook? (Thinking of the example with CMSIS-NN where-in we may use this kind of hook to bundle the CMSIS kernels or API descriptions)

@Mousius

Another thought here, How about using packed functions instead of inventory of hooks ?

Hook is defined in user application or the respective contrib as
@tvm.register_func("adreno.mixed_precision_fp16")
def mixed_precision_fp16(mod, params, args):
      . . . . . .

     return mod
Invoked as --pre-build-hooks "adreno.mixed_precision_fp16" and tvmc looks for this packed function and invokes it.

This can be good starting point where tvmc can be out of vendor specific hooks and going forward when we see any API's used across many targets we could bring them into tvmc infra.

Sounds a bit dangerous to me, generically calling a packed function against a module also means you have access to a plethora of other registered functions - do we want to expose them all to the audience of tvmc? I'd also still push for things that are more specific to a given Target to be attached to the Target itself, so the scripting API and CLI remain aligned.

srkreddy1238 · 2023-02-21T03:43:23Z

The advice I'd give here is that is that we don't always need a generic solution to a specific problem, the --mixed-precision parameter provides benefits to other Targets and doesn't require us creating a new plugin system within tvmc - we have similar functionality in the --alter-layout parameter.

To implement the complete functionality of mixed precision we need passing the list of ops to be converted to mixed precision, precision type and other configurable options for mixed precision pass. All these are use case dependent. If we are good with enabling mixed-precision pass with it's configurable options, I am good to go this way.

What is the use-case for this? I can't think of a use-case for generically bundling things into the eventual output unless I'm missing something, specific Targets would likely benefit from it, and they'd better understand the shape of their own dependencies which indicates it should be done in some Target hook? (Thinking of the example with CMSIS-NN where-in we may use this kind of hook to bundle the CMSIS kernels or API descriptions)

My use case is about embedding additional tuning cache and target precompiled OpenCL binary source into the module itself. These tuning cache and clBinary blobs will be generated by running the compiled modules on real device via RPC. I don't think we now encourage this way from the core compilation flow itself. IMHO, such first time requirements should enabled outside core compiler and later absorbed if there are more similar use cases.

Sounds a bit dangerous to me, generically calling a packed function against a module also means you have access to a plethora of other registered functions - do we want to expose them all to the audience of tvmc? I'd also still push for things that are more specific to a given Target to be attached to the Target itself, so the scripting API and CLI remain aligned.

tvmc is a great tool for new comers with it's simplicity and functionality. In my opinion, tvmc is a good starting point for a newbie to quickly achieve the entire process from CLI. On the other side, packed function system is not some thing we want to hide from users. We encourage users to register their own packed functions into TVM compiler flow as well as use TVM's packed functions.

From a newbie point of view, option --pre-build-hooks "adreno.mixed_precision_fp16" is just a command line option and don't even know about packed function or there exists a global registry. Without ```tvmc`` support, the user need to learn and switch into python interface for more functionality.

We are talking about only two options

Add functionality directly into TVM compiler core (accessible via tvmc)
Use python interface for advanced feature access.
I am talking about a staging option. It's like a feature which require some feedback from developers/users before making it a mainline compiler feature (due to it's complexity to implement into mainline compiler or does it have more audience or not).

As I see, we have two perspectives here to consider. One being tool integrity and other user experience.
Staging helps tool integrity as we don't need to take every feature on day one into tvmc. Other hand, these options definitely results good user experience as all the functionality can be achieved through CLI interface.

Mousius · 2023-02-21T18:26:38Z

To implement the complete functionality of mixed precision we need passing the list of ops to be converted to mixed precision, precision type and other configurable options for mixed precision pass. All these are use case dependent. If we are good with enabling mixed-precision pass with it's configurable options, I am good to go this way.

I think this makes sense, a naive first attempt is something like this?

--mixed-precision // Enables
--mixed-precision-ops=nn.conv2d,nn.dense
--mixed-precision-input=float16
--mixed-precision-output=float32

My use case is about embedding additional tuning cache and target precompiled OpenCL binary source into the module itself. These tuning cache and clBinary blobs will be generated by running the compiled modules on real device via RPC. I don't think we now encourage this way from the core compilation flow itself. IMHO, such first time requirements should enabled outside core compiler and later absorbed if there are more similar use cases.

That definitely looks specific to OpenCL and loading precompiled OpenCL? Are all of these modules generated ahead of TVM compilation? I don't think it necessarily has to be deep in the core compiler, but it would make sense for such options to live near Target specific logic. It's a bit tricky to work through this without an example though, it'd be good to raise something outside this PR else I think this will take a lot longer.

tvmc is a great tool for new comers with it's simplicity and functionality. In my opinion, tvmc is a good starting point for a newbie to quickly achieve the entire process from CLI. On the other side, packed function system is not some thing we want to hide from users. We encourage users to register their own packed functions into TVM compiler flow as well as use TVM's packed functions.

One of the reasons tvmc is simpler is because it doesn't expose many of the inner workings of TVM, such as the packed API, it abstracts many things to give a simpler user experience 😸 A large part of that is easily printing a help message and having arguments describe what they do, we may understand --pre-build-hooks "adreno.mixed_precision_fp16" but --pre-build-hooks would be hard to express to a user.

I don't think we're disagreeing on the way forwards for mixed precision though? Shall we try and get that resolved and look to the OpenCL blobs at the same time?

srkreddy1238 · 2023-02-22T06:01:20Z

Thanks for your time on this @Mousius

Yes, we are good with --mixed-precision and I will enhance this PR accordingly.

OpenCL blobs feature will have major change in TVM compiler and TVMC changes is cherry on the top. We can revisit with an RFC and more details.

…ions enabled Adds new command line options --mixed-precision --mixed-precision-ops --mixed-precision-input --mixed-precision-output and --desired-layout-ops This PR also enhances the python interface by replacing alter_layout to transform_args. transform_args is a dict with all tranform related options including existing desired_layout or alter_layout option.

elvin-n · 2023-02-23T19:44:33Z

That definitely looks specific to OpenCL and loading precompiled OpenCL? Are all of these modules generated ahead of TVM compilation? I don't think it necessarily has to be deep in the core compiler, but it would make sense for such options to live near Target specific logic. It's a bit tricky to work through this without an example though, it'd be good to raise something outside this PR else I think this will take a lot longer.

Probably I missed the initial proposal and don't feel that I fully understand proposal to live near Target specific logic. But conversion to FP16 and execution in low precision is not specific neither to opencl nor to Adreno. This is generic for many platforms. We have to support such conversion for

any opencl
arm cpu
hexagon
cuda
vulkan
metal
rockm

If we add such feature dedicated only to Adreno, we will have to extend it to each above target. That would be very undesirable.

python/tvm/driver/tvmc/transform.py

srkreddy1238 · 2023-02-25T10:45:49Z

That definitely looks specific to OpenCL and loading precompiled OpenCL? Are all of these modules generated ahead of TVM compilation? I don't think it necessarily has to be deep in the core compiler, but it would make sense for such options to live near Target specific logic. It's a bit tricky to work through this without an example though, it'd be good to raise something outside this PR else I think this will take a lot longer.

Probably I missed the initial proposal and don't feel that I fully understand proposal to live near Target specific logic. But conversion to FP16 and execution in low precision is not specific neither to opencl nor to Adreno. This is generic for many platforms. We have to support such conversion for

any opencl

arm cpu

hexagon

cuda

vulkan

metal

rockm

If we add such feature dedicated only to Adreno, we will have to extend it to each above target. That would be very undesirable.

@elvin-n The discussion on this PR had two contexts a while back. Pre compile hooks (Primarily to support FP16) & post build hook (to allow additional processing on compiled module - To be specific importing additional modules like OpenCL cache/binary programs ...etc.). Finally, we decided to confine this PR to discuss about FP16 support and now the proposal is to generically support for any target (which is inline with your recommendation :)).

Mousius

This looks really cool @srkreddy1238 😸 hopefully only a few things to resolve and we can this feature in!

python/tvm/driver/tvmc/autotuner.py

python/tvm/driver/tvmc/transform.py

tests/python/driver/tvmc/test_transform.py

python/tvm/driver/tvmc/transform.py

python/tvm/driver/tvmc/compiler.py

python/tvm/driver/tvmc/transform.py

python/tvm/driver/tvmc/autotuner.py

python/tvm/driver/tvmc/compiler.py

Mousius

I think this is fine, pending linting - @leandron could you take another look?

srkreddy1238 · 2023-03-08T05:56:05Z

@leandron can you take a look on this ? I have some dependencies for TVMCon next week.

Comments addressed, given sufficient time to re-review, should there be an issue let me know :-)

srkreddy1238 · 2023-03-09T10:22:38Z

Thanks @echuraev & @Mousius . Happy to address any concerns later as a different PR.

this aims to make the `--desired-layout` argument more powerful based on the previously merged changes from #14010 by introducing two new features: 1. Allow passing multiple arguments to `--desired-layout` instead of only one, to specify one layout per transformed operator specified in `--desired-layout-ops`. (Number of arguments has to bei either 1 or match the number of transformed operators) 2. Optionally, you can now specify a non-default kernel layout as follows: `NHWC:HWIO` Example Usage: `tvmc compile … --desired-layout nn.max_pool2d qnn.conv2d --desired-layout-ops NCHW NHWC:HWIO` I also added unit tests for the new use-cases. ### Known Limitations: * It would make sense to specify individual kernel layouts for regular convolutions and depthwise ones. However since both are usually implemented as generalized `nn.conv2d`, we can not transform them individually. Are there any good workarounds for this? * The arguments of `--desired-layouts` have previously been checked for validity during cmdline parsing (e.g. only NCHW and NHWC are allowed) which is not possible anymore. Should I add a regular expression for that?

srkreddy1238 requested a review from areusch February 16, 2023 12:09

srkreddy1238 mentioned this pull request Feb 16, 2023

[DOCS][ADRENO] Improved Adreno documentation #13867

Merged

srkreddy1238 requested review from echuraev and leandron February 16, 2023 12:12

leandron previously requested changes Feb 16, 2023

View reviewed changes

leandron reviewed Feb 16, 2023

View reviewed changes

python/tvm/driver/tvmc/compiler.py Outdated Show resolved Hide resolved

python/tvm/driver/tvmc/composite_target.py Outdated Show resolved Hide resolved

srkreddy1238 force-pushed the tvmc-preprocess branch from c90657b to 8678f72 Compare February 16, 2023 16:44

srkreddy1238 force-pushed the tvmc-preprocess branch from 8678f72 to 2bc0763 Compare February 16, 2023 17:46

srkreddy1238 force-pushed the tvmc-preprocess branch from 4b1a20f to 938f561 Compare February 17, 2023 06:31

srkreddy1238 force-pushed the tvmc-preprocess branch 5 times, most recently from a9ac14b to 4b62c12 Compare February 23, 2023 05:38

srkreddy1238 requested review from leandron and Mousius February 23, 2023 05:40

srkreddy1238 force-pushed the tvmc-preprocess branch from 4b62c12 to c09ed29 Compare February 23, 2023 06:21

srkreddy1238 force-pushed the tvmc-preprocess branch from c09ed29 to 670c523 Compare February 23, 2023 06:36

srkreddy1238 changed the title ~~[TVMC][HOOKS] Target specific pre and post processing hooks~~ [TVMC][TRANSFORMS] ToMixedPrecision transform support with custom options enabled Feb 23, 2023

elvin-n suggested changes Feb 23, 2023

View reviewed changes

python/tvm/driver/tvmc/transform.py Outdated Show resolved Hide resolved

python/tvm/driver/tvmc/transform.py Outdated Show resolved Hide resolved

Mousius requested changes Feb 27, 2023

View reviewed changes

srkreddy1238 force-pushed the tvmc-preprocess branch 3 times, most recently from 1949d26 to e55faaf Compare March 2, 2023 12:26

* review comments.

e55faaf

srkreddy1238 requested review from Mousius and elvin-n March 2, 2023 17:37

* review comments

804d5d0

elvin-n suggested changes Mar 3, 2023

View reviewed changes

python/tvm/driver/tvmc/transform.py Show resolved Hide resolved

python/tvm/driver/tvmc/autotuner.py Show resolved Hide resolved

* review

499c86e

srkreddy1238 requested a review from elvin-n March 4, 2023 05:30

elvin-n approved these changes Mar 6, 2023

View reviewed changes

python/tvm/driver/tvmc/compiler.py Outdated Show resolved Hide resolved

Mousius approved these changes Mar 6, 2023

View reviewed changes

* review comments

b0379b4

srkreddy1238 force-pushed the tvmc-preprocess branch from 6d54140 to b0379b4 Compare March 6, 2023 10:42

echuraev approved these changes Mar 9, 2023

View reviewed changes

Mousius merged commit 52292cf into apache:main Mar 9, 2023

PhilippvK mentioned this pull request Mar 11, 2023

[TVMC] Improve --desired-layouts functionality #14272

Merged

ysh329 mentioned this pull request Apr 17, 2023

[Release] v0.12.0 Release Candidate Notes #14645

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TVMC][TRANSFORMS] ToMixedPrecision transform support with custom options enabled #14010

[TVMC][TRANSFORMS] ToMixedPrecision transform support with custom options enabled #14010

srkreddy1238 commented Feb 16, 2023 •

edited by Mousius

Loading

tvm-bot commented Feb 16, 2023 •

edited

Loading

leandron left a comment

Mousius commented Feb 16, 2023

srkreddy1238 commented Feb 16, 2023

Mousius commented Feb 16, 2023

srkreddy1238 commented Feb 16, 2023

srkreddy1238 commented Feb 17, 2023

Mousius commented Feb 20, 2023

srkreddy1238 commented Feb 21, 2023 •

edited

Loading

Mousius commented Feb 21, 2023

srkreddy1238 commented Feb 22, 2023

elvin-n commented Feb 23, 2023

srkreddy1238 commented Feb 25, 2023

Mousius left a comment

Mousius left a comment

srkreddy1238 commented Mar 8, 2023

srkreddy1238 commented Mar 9, 2023

[TVMC][TRANSFORMS] ToMixedPrecision transform support with custom options enabled #14010

[TVMC][TRANSFORMS] ToMixedPrecision transform support with custom options enabled #14010

Conversation

srkreddy1238 commented Feb 16, 2023 • edited by Mousius Loading

tvm-bot commented Feb 16, 2023 • edited Loading

leandron left a comment

Choose a reason for hiding this comment

Mousius commented Feb 16, 2023

srkreddy1238 commented Feb 16, 2023

Mousius commented Feb 16, 2023

srkreddy1238 commented Feb 16, 2023

srkreddy1238 commented Feb 17, 2023

Mousius commented Feb 20, 2023

srkreddy1238 commented Feb 21, 2023 • edited Loading

Mousius commented Feb 21, 2023

srkreddy1238 commented Feb 22, 2023

elvin-n commented Feb 23, 2023

srkreddy1238 commented Feb 25, 2023

Mousius left a comment

Choose a reason for hiding this comment

Mousius left a comment

Choose a reason for hiding this comment

srkreddy1238 commented Mar 8, 2023

srkreddy1238 commented Mar 9, 2023

srkreddy1238 commented Feb 16, 2023 •

edited by Mousius

Loading

tvm-bot commented Feb 16, 2023 •

edited

Loading

srkreddy1238 commented Feb 21, 2023 •

edited

Loading