-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[UMA] UMA v1.0 #12087
[UMA] UMA v1.0 #12087
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MichaelJKlaiber , @cgerum , @PaulPalomeroBernardo
My congratulations for the wonderful work and the outstanding idea towards modularization !
- Suggested a small nit for cmake completeness
- Please rebase against apache:main HEAD, it looks out of sync
- Please invoke with Cc more reviewers (i.e. those from the UMA RFC PR)
Cc @driazati may help with the missing
|
8b917d7
to
abb2976
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the great contribution for better accelerator supports! 🥳
I enjoyed reading the code and learning the design.
Overall, implementation looks good to me.
Here are a few high-level comments:
- Could you add more documentation in general (e.g., explain what parameters are)?
- It would be great to have a testing for end-to-end model-level testing. (e.g., run BERT with UMA on ARM EthosU)
Please see below for more comments. Thanks!
|
||
def _register_c_codegen( | ||
self, | ||
includes: Callable[[], str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add the docstring for parameters?
Also, can includes
be Optional[str]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc-string added.
@cgerum : can you answer the other question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To generate dynamic strings more easily we would be in favor of passing a Callable
@PaulPalomeroBernardo @cgerum
Personal view over the first part (non code related) of @sunggg requests, of course just IMHO:
Some personal efforts envisioned similar out-of-tree approaches {for i.e. OLIMP / e-verest } are before UMA just landed as super nice "out-of-tree" topping. As my last wish, I put forward hope that we community one day will be able to leverage jointly the complex hardware-design space within TVM's unique compiler-space. Beyond code-related requests below, I halt my review here with a @lgtm claim, and a go for UMA v1.0 contrib !
|
@cbalint13, appreciate sharing your thoughts! I also think this is a great out-of-tree approach and look forward to its actual usage.
I agree that the overall docstring structure (e.g., Vanilla, Strawberry & Chocolate) would be a great introduction/tutorial (Truly appreciate the authors!).
I see. Thank you for the clarification! Maybe my example was not relevant. I just thought it would be nice to have some model-level test cases since we seem to have all necessary components (UMA Partitioner, UMA Pipeline, UMA Lower, UMA Codegen) in this PR. I just wanted to see if current PR also covers a case where we lower some parts with UMA while handling other unsupported parts via the conventional pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @MichaelJKlaiber I've given this a look over. I think it's generally in the right direction, suggest a couple more comments and a few minor changes here and there. overall though, the approach looks great. i'd love it if we could explore integrating this further with tvmc since we are registering a custom Target. happy to discuss further!
tests/python/contrib/test_uma/test_uma_lowering_with_umalower.py
Outdated
Show resolved
Hide resolved
* "Compiler" attribute. Functions that do not match this condition remain | ||
* unaltered. | ||
*/ | ||
class OutlineCompilerFunctionsMutator : public MixedModeMutator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this already exists in ethosu, can we factor it out instead of duplicating?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is in fact just a copy of the ethosu code. I think, it is a generally useful pass for accelerator flows, so +1 to make this a generic transform pass, if this is an option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@areusch, should we move this to a seperate PR then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we're duplicating it here, i'd prefer if we could factor it out now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, i'm fine to defer this to a follow-on PR if @manupa-arm is ok with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe with a TODO/issue then ?
@areusch ,
|
We have an example testcase for running MobileNet (created in Relay) with uma and the Vanilla Accelerator in place. Not sure if it should be added to this PR or a later one, as this PR already is a lot to discuss. I'd be in favor of moving this to the next UMA PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great work!
I've done an intial round of review -- my concerns are mostly around the structure and UMA CLI. It would be better seperate the latter to another PR, IMO, given the content in this PR.
AOTTestRunner, | ||
generate_ref_data, | ||
compile_and_run, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a test source. Should we move this to tests ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@manupa-arm , this is not supposed to be test code. This is example code for the tutorial. We will replace AOTTestRunner by microTVM Project API
* "Compiler" attribute. Functions that do not match this condition remain | ||
* unaltered. | ||
*/ | ||
class OutlineCompilerFunctionsMutator : public MixedModeMutator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @MichaelJKlaiber this is an exciting addition! Just wanted to chip in with a few more suggestions / nits.
Since the code in python/tvm/relay/backend/contrib/uma/*
is a developer facing API, it might be worth enabling mypy type checking in task_mypy.sh
.
], | ||
Optional[int], | ||
] | ||
] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to break these type annotations into smaller named variables? e.g. like PatternTable
in partitioner.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just noting i think this one isn't done yet
we discussed this in the Community Meeting yesterday. here are notes on the discussion:
|
UMAv1.0 updates according to comments and suggestions from PR apache#12087
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @MichaelJKlaiber! few last things
* "Compiler" attribute. Functions that do not match this condition remain | ||
* unaltered. | ||
*/ | ||
class OutlineCompilerFunctionsMutator : public MixedModeMutator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we're duplicating it here, i'd prefer if we could factor it out now
####################################################################### | ||
# Relay to TIR function registration | ||
####################################################################### | ||
self._register_tir_pass(PassPhase.TIR_PHASE_0, MyAiHwConv2dPass()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @mbaret one thing we've been discussing is that it might make sense for these enums to roughly correspond to "guarantees about the form of the IRModule" (e.g. "all the primitive Functions are lowered" or "Buffers are defined"). To that end I think it would be great to avoid referencing e.g. TIR_PHASE_n in the enum unless those are distinct guarantees.
CI pipeline still fails for the target i386, see https://ci.tlcpack.ai/blue/rest/organizations/jenkins/pipelines/tvm/branches/PR-12087/runs/12/nodes/360/steps/1207/log/?start=0 tests/scripts/task_config_build_i386.sh What are your thoughts? Does anyone of you understand what is different from i386 to build_cpu? Does it have to do with: |
31b6107
to
cac3060
Compare
Thanks to all the reviewers for the thorough reviews and the great suggestions and comment that improved UMA! We integrated the vast majority of the requests in this review into the PR.
This PR has been open for a quite long time now, it has improved steadily and we feel that it is ready now. Please let us know you thoughts. There are a couple of (small) things we left out for the is PR, and would ask for you opinion to move it to (very near future) PRs:
Further steps see current tracking issue #11260 It is also great that WE as a community identified items where further discussion is required, e.g.:
Thanks to everyone contributing to UMA, esp @cgerum @PaulPalomeroBernardo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @MichaelJKlaiber ! I found a few small mostly mechanical fixes that would make this PR perfect, but overall it looks good to me. thanks for your (and everyone else involved) hard work on this!
], | ||
Optional[int], | ||
] | ||
] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just noting i think this one isn't done yet
* "Compiler" attribute. Functions that do not match this condition remain | ||
* unaltered. | ||
*/ | ||
class OutlineCompilerFunctionsMutator : public MixedModeMutator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, i'm fine to defer this to a follow-on PR if @manupa-arm is ok with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the exciting contribution and addressing my comments!
Just a super minor nit :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the all the hard work here and for all the changes.
I had a deeper look into the PR. Few nits and a concern with "includes" function for UMABackend.
* | ||
*/ | ||
int | ||
my_ai_hw_conv2dnchw(float* ifmap, float* weights, float* result, int oc, int iw, int ih, int ic, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This source looks like as it was not linted. Maybe we dont lint apps ..
Anyhow, would you be able to run clang-format on this file ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it doesnt look linted. It is what
./docker/lint.sh -i clang_format
gives me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* "Compiler" attribute. Functions that do not match this condition remain | ||
* unaltered. | ||
*/ | ||
class OutlineCompilerFunctionsMutator : public MixedModeMutator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe with a TODO/issue then ?
I am happy with AOTTestRunner as we envision that example/test to be useful for a embedded dev who wants to integrate few models to the their own embedded project. Also IIRC, this was @areusch suggestion :) In same time, it might be also valuable to add a Project API for users that quickly want spin up a project to test the inference. Therefore, I would say not "replace" but maybe just extend/add ? -- Also happy to push it to another PR |
…line due to missing CLANG for i386
1878882
to
0f0b1bf
Compare
thanks @MichaelJKlaiber for all the hard work here! |
* Add minimal working structure for generic interface * Separate target definition from codegen * Update file structure to support multiple NPU targets * Add scheduling and pass support to codegen * Update schedule function and pass registration * Add generic partitioner for relay graph partitioning * Add pattern-based relay graph partitioning and AOT codegen * Update API * Add UltraTrail relay passes and schedule function * Update UltraTrail relay passes * Add tir_to_runtime hook for UltraTrail * Add operator strategy registration to lowering * Add option to pass constants as attributes * Refactor naming: Generic to UMA * Change API to single user-facing backend class UMABackend * Add initial codegen API * [UMA] add a generic packed function to register targets * Restructure files and add initial codegen * Minor code cleanup * Add UMA config and MergeCompilerRegion example * Move UMA configuration to init parameters * Add python hooks for C-codegen. Still has known restrictons * Fix relay_to_tir hook to keep virtual device in main function * Remove register schedules, scheduling is moved to passes for now * Remove extract constants since non-scalar constants are now supported by TVM * API documentation and some code fixes and cleanup * Fix typo * Fix UMA lowering * Prototype for UMA-based target attribute registration * Add default option and type deduction to register_target_attr * Change pass phases to enum * [Relay] Plumb external codegen target via Target.current() for all external codegen paths (See https://discuss.tvm.apache.org/t/byoc-supporting-cutlass-byoc-with-collage/12796/6 for context, which in turn is part of Collage (https://github.com/apache/tvm-rfcs/blob/main/rfcs/0062-collage.md). We want both old-style (via relay.ext.$toolchain) and new-style (via "RelayToTIR" Pass attribute on target kind) external codegen to be able to access the current 'external codegen' Target instance via Target.current(). - For old-style, plumb the true Target through TEComplier and push it on the context stack before calling relay.ext.$toolchain. - For new-style, pass the CompilationConfig to the RelayToTIRTargetHook pass, make the jump from "Compiler" attribute value to Target via the new CompilationConfig::FindPrimitiveTargetForKind method, and push on the stack before invoking the custom "RelayToTIR" pass. While working on this discovered RelayToTIRTargetHook was incompatible with the VM's compilation flow since RelayToTIRTargetHook assumes all "Compiler" attributed functions are inlined. Generalize it to support both inline and global function styles. Extend Target::IsExternalCodegen to recognize target kinds with "RelayToTIR" attributes as external. Update target hooks unit test to exercise new support for outline-style, picking up the current target, and compiling via the VM. * Use current target in lowering * Use attr:kRelayToTIR * Remove erronousely commited quick fix * Towards test cases for uma * Add test_uma * Initial UMA structure for version 1 * [UMA]: conv2d unit test * [UMA] update of tutorial * [UMA] update of pass format, still issue with conv2d c code * [UMA] refactoring of test_uma_lowering_with_umalower.py * [UMA] refactoring of test_uma_lowering_with_umalower.py * [UMA] Adding backend, codegen, patterns, strategies and run file for MyAiHw * [UMA] update towards my_ai_hw usecase * [UMA] working testcase for conv2d with uma * [UMA] testcase * [UMA] uma lower.py: replaced outdated function create_prim_func_from_outputs to be compatible withe latest content of "main" * UMA: Move torch import to top to avoid free(): invalid pointer error * Add stub files for targets * Add tests for ultratrail codegen * Adopt my_ai_hw accelerator for new target definition * Add unit test for target attributes * Test string arguments * Extend target test * [UMA] tutorial first versin * [UMA] moved unit tests to contrib * [UMA] renaming interfaces * Fix umalower_tests in ci * make uma a python module * [UMA] Update of UMAv1 API + added testcases + tutorialV1 * [UMA] UMAv1 * [UMA] cmake file updated * AOT test infrastructure adapted * UMA: add __init__.py for uma.api * Finish uma tests * Use upstream version of dmlc-core * [UMA] tir_to_runtime documentation update * [UMA] cleanup * [UMA] fix for test_partition * [UMA] lint fix * [UMA] lint fix * [UMA] lint fix * [UMA] lint fix * [UMA] fix of build scripts for arm and i386 * Fix remaining linter errors * [UMA] CMakeLists.txt added UMA tvm_option * [UMA] added UMA tvm_option * [UMA] guard against multiple registrations * [UMA] fixed comments as pointed out in PR 12087 * [UMA] fixed comments as pointed out in PR 12087 * [UMA] skip uma tests if uma is not available * [UMA] added UMA rst * [UMA] Moved tutorial to RST file in gallery * [UMA] moved uma cli to apps * [UMA] change requests according to PR-12087 * [UMA] update and sync of uma_cli and tutorial * [UMA] update of template passe: remove Pad block of Conv2D * [UMA] lint updates * [UMA] Test updates * [UMA] fixes according to comments from PR 12087 discussion * [UMA] lint updates * [UMA] moved UMA _template file to apps * [UMA] lint * [UMA] Remove exceptions when dispatching over targets * [UMA] vanilla pattern update * [UMA] added mobilenet integration test * [UMA] clang lint * Remove tir to runtime * [UMA] Use sequential for UMA relay passes * Use comparison against BYOC flow in test_partition * [UMA] tutorial update: moved code blocks to RST * [UMA] tutorial update and lint fixes * [UMA] removing UMA from i386 build, as there is a fail in the CI pipeline due to missing CLANG for i386 * [BYOC-DNNL] covered case for sum node without attr * [UMA] pylint * [UMA] pylint * [UMA] aot fix * [UMA] Changes PR review * [UMA] cc lint * [UMA] cc lint * Use better function name for te_lowering and annotate current target at TE functions Co-authored-by: Paul Palomero Bernardo <paulpb@outlook.com> Co-authored-by: Christoph Gerum <christoph.gerum@uni-tuebingen.de> Co-authored-by: mbs-octoml <mbs@octoml.ai> Co-authored-by: Christoph Gerum <gerum@informatik.uni-tuebingen.de>
* Add minimal working structure for generic interface * Separate target definition from codegen * Update file structure to support multiple NPU targets * Add scheduling and pass support to codegen * Update schedule function and pass registration * Add generic partitioner for relay graph partitioning * Add pattern-based relay graph partitioning and AOT codegen * Update API * Add UltraTrail relay passes and schedule function * Update UltraTrail relay passes * Add tir_to_runtime hook for UltraTrail * Add operator strategy registration to lowering * Add option to pass constants as attributes * Refactor naming: Generic to UMA * Change API to single user-facing backend class UMABackend * Add initial codegen API * [UMA] add a generic packed function to register targets * Restructure files and add initial codegen * Minor code cleanup * Add UMA config and MergeCompilerRegion example * Move UMA configuration to init parameters * Add python hooks for C-codegen. Still has known restrictons * Fix relay_to_tir hook to keep virtual device in main function * Remove register schedules, scheduling is moved to passes for now * Remove extract constants since non-scalar constants are now supported by TVM * API documentation and some code fixes and cleanup * Fix typo * Fix UMA lowering * Prototype for UMA-based target attribute registration * Add default option and type deduction to register_target_attr * Change pass phases to enum * [Relay] Plumb external codegen target via Target.current() for all external codegen paths (See https://discuss.tvm.apache.org/t/byoc-supporting-cutlass-byoc-with-collage/12796/6 for context, which in turn is part of Collage (https://github.com/apache/tvm-rfcs/blob/main/rfcs/0062-collage.md). We want both old-style (via relay.ext.$toolchain) and new-style (via "RelayToTIR" Pass attribute on target kind) external codegen to be able to access the current 'external codegen' Target instance via Target.current(). - For old-style, plumb the true Target through TEComplier and push it on the context stack before calling relay.ext.$toolchain. - For new-style, pass the CompilationConfig to the RelayToTIRTargetHook pass, make the jump from "Compiler" attribute value to Target via the new CompilationConfig::FindPrimitiveTargetForKind method, and push on the stack before invoking the custom "RelayToTIR" pass. While working on this discovered RelayToTIRTargetHook was incompatible with the VM's compilation flow since RelayToTIRTargetHook assumes all "Compiler" attributed functions are inlined. Generalize it to support both inline and global function styles. Extend Target::IsExternalCodegen to recognize target kinds with "RelayToTIR" attributes as external. Update target hooks unit test to exercise new support for outline-style, picking up the current target, and compiling via the VM. * Use current target in lowering * Use attr:kRelayToTIR * Remove erronousely commited quick fix * Towards test cases for uma * Add test_uma * Initial UMA structure for version 1 * [UMA]: conv2d unit test * [UMA] update of tutorial * [UMA] update of pass format, still issue with conv2d c code * [UMA] refactoring of test_uma_lowering_with_umalower.py * [UMA] refactoring of test_uma_lowering_with_umalower.py * [UMA] Adding backend, codegen, patterns, strategies and run file for MyAiHw * [UMA] update towards my_ai_hw usecase * [UMA] working testcase for conv2d with uma * [UMA] testcase * [UMA] uma lower.py: replaced outdated function create_prim_func_from_outputs to be compatible withe latest content of "main" * UMA: Move torch import to top to avoid free(): invalid pointer error * Add stub files for targets * Add tests for ultratrail codegen * Adopt my_ai_hw accelerator for new target definition * Add unit test for target attributes * Test string arguments * Extend target test * [UMA] tutorial first versin * [UMA] moved unit tests to contrib * [UMA] renaming interfaces * Fix umalower_tests in ci * make uma a python module * [UMA] Update of UMAv1 API + added testcases + tutorialV1 * [UMA] UMAv1 * [UMA] cmake file updated * AOT test infrastructure adapted * UMA: add __init__.py for uma.api * Finish uma tests * Use upstream version of dmlc-core * [UMA] tir_to_runtime documentation update * [UMA] cleanup * [UMA] fix for test_partition * [UMA] lint fix * [UMA] lint fix * [UMA] lint fix * [UMA] lint fix * [UMA] fix of build scripts for arm and i386 * Fix remaining linter errors * [UMA] CMakeLists.txt added UMA tvm_option * [UMA] added UMA tvm_option * [UMA] guard against multiple registrations * [UMA] fixed comments as pointed out in PR 12087 * [UMA] fixed comments as pointed out in PR 12087 * [UMA] skip uma tests if uma is not available * [UMA] added UMA rst * [UMA] Moved tutorial to RST file in gallery * [UMA] moved uma cli to apps * [UMA] change requests according to PR-12087 * [UMA] update and sync of uma_cli and tutorial * [UMA] update of template passe: remove Pad block of Conv2D * [UMA] lint updates * [UMA] Test updates * [UMA] fixes according to comments from PR 12087 discussion * [UMA] lint updates * [UMA] moved UMA _template file to apps * [UMA] lint * [UMA] Remove exceptions when dispatching over targets * [UMA] vanilla pattern update * [UMA] added mobilenet integration test * [UMA] clang lint * Remove tir to runtime * [UMA] Use sequential for UMA relay passes * Use comparison against BYOC flow in test_partition * [UMA] tutorial update: moved code blocks to RST * [UMA] tutorial update and lint fixes * [UMA] removing UMA from i386 build, as there is a fail in the CI pipeline due to missing CLANG for i386 * [BYOC-DNNL] covered case for sum node without attr * [UMA] pylint * [UMA] pylint * [UMA] aot fix * [UMA] Changes PR review * [UMA] cc lint * [UMA] cc lint * Use better function name for te_lowering and annotate current target at TE functions Co-authored-by: Paul Palomero Bernardo <paulpb@outlook.com> Co-authored-by: Christoph Gerum <christoph.gerum@uni-tuebingen.de> Co-authored-by: mbs-octoml <mbs@octoml.ai> Co-authored-by: Christoph Gerum <gerum@informatik.uni-tuebingen.de>
UMA v1: