Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TVMC][TRANSFORMS] ToMixedPrecision transform support with custom options enabled #14010

Merged
merged 5 commits into from
Mar 9, 2023

Conversation

srkreddy1238
Copy link
Contributor

@srkreddy1238 srkreddy1238 commented Feb 16, 2023

Adds new command line options:

  • --mixed-precision - Enable mixed precision conversion
  • --mixed-precision-ops - List of operators to be converted to mixed precision
  • --mixed-precision-calculation-type - Calculation precision type
  • --mixed-precision-acc-type - Accumulator precision type

And:

  • --desired-layout-ops - The list of operators to be transformed with desired layout.

@tvm-bot
Copy link
Collaborator

tvm-bot commented Feb 16, 2023

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

Copy link
Contributor

@leandron leandron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @srkreddy1238 this looks very interesting. Can you improve the PR description with an example command on how it looks like?

Can you also add a few tests to make sure this don't regress in future? Thanks.

@Mousius
Copy link
Member

Mousius commented Feb 16, 2023

@srkreddy1238 is there a reason you're not doing this via something like RelayToTIR or similar? it seems you could create a Target hook which would do this whenever it encounters specific parameters rather than having to manually add it to the tvmc command line?

python/tvm/driver/tvmc/compiler.py Outdated Show resolved Hide resolved
python/tvm/driver/tvmc/composite_target.py Outdated Show resolved Hide resolved
@srkreddy1238
Copy link
Contributor Author

@srkreddy1238 is there a reason you're not doing this via something like RelayToTIR or similar? it seems you could create a Target hook which would do this whenever it encounters specific parameters rather than having to manually add it to the tvmc command line?

@Mousius thanks for the review.

Initially I thought of this approach where I can create an external codegen and extend the REGISTERED_CODEGEN as shown below.

REGISTERED_CODEGEN = {

This way tvmc calls the pass_pipeline and we can do what ever we want over the module.

These hooks are essentially small personalization over existing transforms. Like in this example we tried to customize the Mixed Precision pass for Adreno. Essentially we are helping developers with these simple reusable utilities. These hooks are optional features on the existing target and they may grow over time. Creating multiple targets for each such feature results in multiple similar targets definitions.

I thought hooks may be a better approach to keep them away from the existing real external codegens.

What do you think ?

@Mousius
Copy link
Member

Mousius commented Feb 16, 2023

@srkreddy1238 is there a reason you're not doing this via something like RelayToTIR or similar? it seems you could create a Target hook which would do this whenever it encounters specific parameters rather than having to manually add it to the tvmc command line?

@Mousius thanks for the review.

Initially I thought of this approach where I can create an external codegen and extend the REGISTERED_CODEGEN as shown below.

REGISTERED_CODEGEN = {

This way tvmc calls the pass_pipeline and we can do what ever we want over the module.
These hooks are essentially small personalization over existing transforms. Like in this example we tried to customize the Mixed Precision pass for Adreno. Essentially we are helping developers with these simple reusable utilities. These hooks are optional features on the existing target and they may grow over time. Creating multiple targets for each such feature results in multiple similar targets definitions.

I thought hooks may be a better approach to keep them away from the existing real external codegens.

What do you think ?

I think there's two different thoughts:

Rather than wrapping this in tvmc, it'd be good to use Target hooks and process the Target within the hook to decide any additional lowering passes, there's also pre-processing logic which you can apply - I gather this is harder as it's going through some GPU target? I'm a bit nervous about creating a whole hook registry for this as it seems like we should be pushing that into the Target rather than working around it in tvmc- from a users point of view it should be something like:

tvmc --target=opencl --target-opencl-mixed-precision

Else, do you agree it's a bit odd to have to essentially build your ML pipeline in CLI arguments?

The other thing is that you mentioned this being reusable utility, I like that approach more, and I'm curious whether we can add a --mixed-precision option so as to benefit all Targets?

@srkreddy1238
Copy link
Contributor Author

Adreno currently share OpenCL target, we probably should consider defining an exclusive Target for Adreno to support customizations.

I like the idea of tvmc directly supporting --mixed-precision kind of options for these target independent preprocess. The target dependents definitely can go to target options. Current requirement of mixed-precision can definitely be used across all targets.

Not sure about tvmc accommodating preprocessing needs that arise from vendors in future which are not critical for Targets but developer friendly.

Coming to post build hooks, the intension here is to accommodate altering the compiled library module. One of the use case being importing additional modules before export/save. Any thoughts here ?

@srkreddy1238
Copy link
Contributor Author

@Mousius

Another thought here, How about using packed functions instead of inventory of hooks ?

Hook is defined in user application or the respective contrib as

@tvm.register_func("adreno.mixed_precision_fp16")
def mixed_precision_fp16(mod, params, args):
      . . . . . .

     return mod

Invoked as --pre-build-hooks "adreno.mixed_precision_fp16" and tvmc looks for this packed function and invokes it.

This can be good starting point where tvmc can be out of vendor specific hooks and going forward when we see any API's used across many targets we could bring them into tvmc infra.

@Mousius
Copy link
Member

Mousius commented Feb 20, 2023

Adreno currently share OpenCL target, we probably should consider defining an exclusive Target for Adreno to support customizations.

I like the idea of tvmc directly supporting --mixed-precision kind of options for these target independent preprocess. The target dependents definitely can go to target options. Current requirement of mixed-precision can definitely be used across all targets.

Not sure about tvmc accommodating preprocessing needs that arise from vendors in future which are not critical for Targets but developer friendly.

The advice I'd give here is that is that we don't always need a generic solution to a specific problem, the --mixed-precision parameter provides benefits to other Targets and doesn't require us creating a new plugin system within tvmc - we have similar functionality in the --alter-layout parameter.

Coming to post build hooks, the intension here is to accommodate altering the compiled library module. One of the use case being importing additional modules before export/save. Any thoughts here ?

What is the use-case for this? I can't think of a use-case for generically bundling things into the eventual output unless I'm missing something, specific Targets would likely benefit from it, and they'd better understand the shape of their own dependencies which indicates it should be done in some Target hook? (Thinking of the example with CMSIS-NN where-in we may use this kind of hook to bundle the CMSIS kernels or API descriptions)

@Mousius

Another thought here, How about using packed functions instead of inventory of hooks ?

Hook is defined in user application or the respective contrib as

@tvm.register_func("adreno.mixed_precision_fp16")
def mixed_precision_fp16(mod, params, args):
      . . . . . .

     return mod

Invoked as --pre-build-hooks "adreno.mixed_precision_fp16" and tvmc looks for this packed function and invokes it.

This can be good starting point where tvmc can be out of vendor specific hooks and going forward when we see any API's used across many targets we could bring them into tvmc infra.

Sounds a bit dangerous to me, generically calling a packed function against a module also means you have access to a plethora of other registered functions - do we want to expose them all to the audience of tvmc? I'd also still push for things that are more specific to a given Target to be attached to the Target itself, so the scripting API and CLI remain aligned.

@srkreddy1238
Copy link
Contributor Author

srkreddy1238 commented Feb 21, 2023

The advice I'd give here is that is that we don't always need a generic solution to a specific problem, the --mixed-precision parameter provides benefits to other Targets and doesn't require us creating a new plugin system within tvmc - we have similar functionality in the --alter-layout parameter.

To implement the complete functionality of mixed precision we need passing the list of ops to be converted to mixed precision, precision type and other configurable options for mixed precision pass. All these are use case dependent. If we are good with enabling mixed-precision pass with it's configurable options, I am good to go this way.

What is the use-case for this? I can't think of a use-case for generically bundling things into the eventual output unless I'm missing something, specific Targets would likely benefit from it, and they'd better understand the shape of their own dependencies which indicates it should be done in some Target hook? (Thinking of the example with CMSIS-NN where-in we may use this kind of hook to bundle the CMSIS kernels or API descriptions)

My use case is about embedding additional tuning cache and target precompiled OpenCL binary source into the module itself. These tuning cache and clBinary blobs will be generated by running the compiled modules on real device via RPC. I don't think we now encourage this way from the core compilation flow itself. IMHO, such first time requirements should enabled outside core compiler and later absorbed if there are more similar use cases.

Sounds a bit dangerous to me, generically calling a packed function against a module also means you have access to a plethora of other registered functions - do we want to expose them all to the audience of tvmc? I'd also still push for things that are more specific to a given Target to be attached to the Target itself, so the scripting API and CLI remain aligned.

tvmc is a great tool for new comers with it's simplicity and functionality. In my opinion, tvmc is a good starting point for a newbie to quickly achieve the entire process from CLI. On the other side, packed function system is not some thing we want to hide from users. We encourage users to register their own packed functions into TVM compiler flow as well as use TVM's packed functions.

From a newbie point of view, option --pre-build-hooks "adreno.mixed_precision_fp16" is just a command line option and don't even know about packed function or there exists a global registry. Without ```tvmc`` support, the user need to learn and switch into python interface for more functionality.

We are talking about only two options

  • Add functionality directly into TVM compiler core (accessible via tvmc)
  • Use python interface for advanced feature access.
    I am talking about a staging option. It's like a feature which require some feedback from developers/users before making it a mainline compiler feature (due to it's complexity to implement into mainline compiler or does it have more audience or not).

As I see, we have two perspectives here to consider. One being tool integrity and other user experience.
Staging helps tool integrity as we don't need to take every feature on day one into tvmc. Other hand, these options definitely results good user experience as all the functionality can be achieved through CLI interface.

@Mousius
Copy link
Member

Mousius commented Feb 21, 2023

To implement the complete functionality of mixed precision we need passing the list of ops to be converted to mixed precision, precision type and other configurable options for mixed precision pass. All these are use case dependent. If we are good with enabling mixed-precision pass with it's configurable options, I am good to go this way.

I think this makes sense, a naive first attempt is something like this?

--mixed-precision // Enables
--mixed-precision-ops=nn.conv2d,nn.dense
--mixed-precision-input=float16
--mixed-precision-output=float32

My use case is about embedding additional tuning cache and target precompiled OpenCL binary source into the module itself. These tuning cache and clBinary blobs will be generated by running the compiled modules on real device via RPC. I don't think we now encourage this way from the core compilation flow itself. IMHO, such first time requirements should enabled outside core compiler and later absorbed if there are more similar use cases.

That definitely looks specific to OpenCL and loading precompiled OpenCL? Are all of these modules generated ahead of TVM compilation? I don't think it necessarily has to be deep in the core compiler, but it would make sense for such options to live near Target specific logic. It's a bit tricky to work through this without an example though, it'd be good to raise something outside this PR else I think this will take a lot longer.

tvmc is a great tool for new comers with it's simplicity and functionality. In my opinion, tvmc is a good starting point for a newbie to quickly achieve the entire process from CLI. On the other side, packed function system is not some thing we want to hide from users. We encourage users to register their own packed functions into TVM compiler flow as well as use TVM's packed functions.

One of the reasons tvmc is simpler is because it doesn't expose many of the inner workings of TVM, such as the packed API, it abstracts many things to give a simpler user experience 😸 A large part of that is easily printing a help message and having arguments describe what they do, we may understand --pre-build-hooks "adreno.mixed_precision_fp16" but --pre-build-hooks would be hard to express to a user.

I don't think we're disagreeing on the way forwards for mixed precision though? Shall we try and get that resolved and look to the OpenCL blobs at the same time?

@srkreddy1238
Copy link
Contributor Author

Thanks for your time on this @Mousius

Yes, we are good with --mixed-precision and I will enhance this PR accordingly.

OpenCL blobs feature will have major change in TVM compiler and TVMC changes is cherry on the top. We can revisit with an RFC and more details.

@srkreddy1238 srkreddy1238 force-pushed the tvmc-preprocess branch 5 times, most recently from a9ac14b to 4b62c12 Compare February 23, 2023 05:38
…ions enabled

Adds new command line options
 --mixed-precision
 --mixed-precision-ops
 --mixed-precision-input
 --mixed-precision-output

and --desired-layout-ops

This PR also enhances the python interface by replacing alter_layout to transform_args.
transform_args is a dict with all tranform related options including existing desired_layout or alter_layout option.
@srkreddy1238 srkreddy1238 changed the title [TVMC][HOOKS] Target specific pre and post processing hooks [TVMC][TRANSFORMS] ToMixedPrecision transform support with custom options enabled Feb 23, 2023
@elvin-n
Copy link
Contributor

elvin-n commented Feb 23, 2023

That definitely looks specific to OpenCL and loading precompiled OpenCL? Are all of these modules generated ahead of TVM compilation? I don't think it necessarily has to be deep in the core compiler, but it would make sense for such options to live near Target specific logic. It's a bit tricky to work through this without an example though, it'd be good to raise something outside this PR else I think this will take a lot longer.

Probably I missed the initial proposal and don't feel that I fully understand proposal to live near Target specific logic. But conversion to FP16 and execution in low precision is not specific neither to opencl nor to Adreno. This is generic for many platforms. We have to support such conversion for

  • any opencl
  • arm cpu
  • hexagon
  • cuda
  • vulkan
  • metal
  • rockm

If we add such feature dedicated only to Adreno, we will have to extend it to each above target. That would be very undesirable.

python/tvm/driver/tvmc/transform.py Outdated Show resolved Hide resolved
python/tvm/driver/tvmc/transform.py Outdated Show resolved Hide resolved
@srkreddy1238
Copy link
Contributor Author

That definitely looks specific to OpenCL and loading precompiled OpenCL? Are all of these modules generated ahead of TVM compilation? I don't think it necessarily has to be deep in the core compiler, but it would make sense for such options to live near Target specific logic. It's a bit tricky to work through this without an example though, it'd be good to raise something outside this PR else I think this will take a lot longer.

Probably I missed the initial proposal and don't feel that I fully understand proposal to live near Target specific logic. But conversion to FP16 and execution in low precision is not specific neither to opencl nor to Adreno. This is generic for many platforms. We have to support such conversion for

  • any opencl
  • arm cpu
  • hexagon
  • cuda
  • vulkan
  • metal
  • rockm

If we add such feature dedicated only to Adreno, we will have to extend it to each above target. That would be very undesirable.

@elvin-n The discussion on this PR had two contexts a while back. Pre compile hooks (Primarily to support FP16) & post build hook (to allow additional processing on compiled module - To be specific importing additional modules like OpenCL cache/binary programs ...etc.). Finally, we decided to confine this PR to discuss about FP16 support and now the proposal is to generically support for any target (which is inline with your recommendation :)).

Copy link
Member

@Mousius Mousius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really cool @srkreddy1238 😸 hopefully only a few things to resolve and we can this feature in!

python/tvm/driver/tvmc/autotuner.py Show resolved Hide resolved
python/tvm/driver/tvmc/transform.py Outdated Show resolved Hide resolved
python/tvm/driver/tvmc/transform.py Outdated Show resolved Hide resolved
python/tvm/driver/tvmc/transform.py Outdated Show resolved Hide resolved
tests/python/driver/tvmc/test_transform.py Outdated Show resolved Hide resolved
python/tvm/driver/tvmc/transform.py Outdated Show resolved Hide resolved
python/tvm/driver/tvmc/compiler.py Show resolved Hide resolved
@srkreddy1238 srkreddy1238 force-pushed the tvmc-preprocess branch 3 times, most recently from 1949d26 to e55faaf Compare March 2, 2023 12:26
@srkreddy1238 srkreddy1238 requested review from Mousius and elvin-n March 2, 2023 17:37
python/tvm/driver/tvmc/transform.py Show resolved Hide resolved
python/tvm/driver/tvmc/autotuner.py Show resolved Hide resolved
@srkreddy1238 srkreddy1238 requested a review from elvin-n March 4, 2023 05:30
python/tvm/driver/tvmc/compiler.py Outdated Show resolved Hide resolved
Copy link
Member

@Mousius Mousius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine, pending linting - @leandron could you take another look?

@srkreddy1238
Copy link
Contributor Author

@leandron can you take a look on this ? I have some dependencies for TVMCon next week.

@Mousius Mousius dismissed leandron’s stale review March 9, 2023 09:53

Comments addressed, given sufficient time to re-review, should there be an issue let me know :-)

@Mousius Mousius merged commit 52292cf into apache:main Mar 9, 2023
@srkreddy1238
Copy link
Contributor Author

Thanks @echuraev & @Mousius . Happy to address any concerns later as a different PR.

lhutton1 pushed a commit that referenced this pull request Mar 16, 2023
this aims to make the `--desired-layout` argument more powerful based on the previously merged changes from #14010 by introducing two new features:

1. Allow passing multiple arguments to `--desired-layout` instead of only one, to specify one layout per transformed operator specified in `--desired-layout-ops`. (Number of arguments has to bei either 1 or match the number of transformed operators)
2. Optionally, you can now specify a non-default kernel layout as follows: `NHWC:HWIO`

Example Usage: `tvmc compile … --desired-layout nn.max_pool2d qnn.conv2d --desired-layout-ops NCHW NHWC:HWIO`

I also added unit tests for the new use-cases.

### Known Limitations:
* It would make sense to specify individual kernel layouts for regular convolutions and depthwise ones. However since both are usually implemented as generalized `nn.conv2d`, we can not transform them individually. Are there any good workarounds for this?
* The arguments of `--desired-layouts` have previously been checked for validity during cmdline parsing (e.g. only NCHW and NHWC are allowed) which is not possible anymore. Should I add a regular expression for that?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants