[TIR] Enable Host Func Attribute for PrimFunc #14020

zxybazh · 2023-02-17T07:06:18Z

This PR enables a new attribute kIsHostFunc to ensure certrain prim func is run on CPU, for example shape_func that computes shape information dynamically. With the new attribute, the primfunc will be skipped in verification pass and split host device pass. A unit test is added.

CC: @sunggg @YuchenJin @tqchen

tvm-bot · 2023-02-17T07:06:22Z

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

cc @Hzfengsy, @junrushao, @quic-sanirudh, @shingjan _{See #10317 for details}

_{Generated by tvm-bot}

junrushao · 2023-02-17T07:13:11Z

Hey Xiyou,

My understanding of VerifyMemory is that it checks the illegal memory access, for example, CPU code directly access GPU memory.

I probably don't have much context on this but was curious:

Q1. in which case, VerifyMemory fails on a shape function that is run on CPU, given there is no GPU array as inputs of the PrimFunc in shape functions?
Q2. in which case, given the target has been attached in PrimFunc as attributes, do we have to add a new attribute because target is not enough?

zxybazh · 2023-02-17T07:51:48Z

Hi, thanks for checking my PR this late! Very good questions!

Let me share some of the context here. We are trying to support a dynamic shape operator on Cuda. This function is generated during a relax pass called VMShapeLower which is part of the relax build. And it will generate a primfunc as follows:

@T.prim_func
def shape_func(H: T.Buffer((T.int64(3),), "int64")):
    T.func_attr({"global_symbol": "shape_func"})
    H[T.int64(2)] = T.int64(4) * H[T.int64(0)] * H[T.int64(1)]

Apparently, it's supposed to be running on CPU, i.e, the host instead of the device. However, since this pass doesn't have access to the target information, when the function is generated it doesn't include the target in its attribute. Therefore, we would like to add an attribute to automatically bind it to the target host in BindTarget and avoid it being splited into device code.

For Q1, It does not fail because this pass is after BindTarget. Thanks for the tip! ~~Will remove this change in verify memory pass.~~ Has reverted the change.
For Q2, given the context, IMHO if we can access target information in the pass and do target binding, it's possible to avoid this new attribute. I'm not quite sure if it's expected to add target as argument for certain pass.

junrushao · 2023-02-17T07:58:37Z

For Q2, given the context, IMHO if we can access target information in the pass and do target binding, it's possible to avoid this new attribute. I'm not quite sure if it's expected to add target as argument for certain pass.

I'm not sure if I'm missing anything, but I do think the VerifyMemory pass assumes the target information is always available:

tvm/src/tir/analysis/verify_memory.cc

Lines 180 to 185 in d7253fb

    
           auto target = func->GetAttr<Target>(tvm::attr::kTarget); 
        
           ICHECK(target.defined()) << "VerifyMemory: Require the target attribute"; 
        
           VLOG(1) << "verifying memory for target '" << target.value()->str() 
        
                   << "' for primitive:" << std::endl 
        
                   << func;

zxybazh · 2023-02-17T08:02:34Z

Yes, it's available in VerifyMemory pass and I've reverted the change in this pass. For Q2 I was refering to VMShapeLower pass in relax build, where this primfunc is generated.

src/tir/transforms/split_host_device.cc

tests/python/unittest/test_tir_host_func.py

tqchen · 2023-02-17T14:19:22Z

Thanks @zxybazh . After looking at the discussions especially inputs from junru. I think it would be great to clarify that we want is_host_func and target attr to be mutually exclusive to each other and only being used by BindTarget.

After explicit target being attached to the function then such attr is no longer necessary and can be a source of duplication.

So we only need changes for BindTarget here and possibly a UT that pass only

junrushao · 2023-02-17T18:00:05Z

Thanks @zxybazh @tqchen for the clarification - this has been much clear to me now! Let's do the following change:

In BindTarget, if we have tvm::tir::attr::kIsHostFunc flag set up, bind the target host instead and also remove the kIsHostFunc flag in the meantime.
Remove the logic in SplitHostDevice

zxybazh · 2023-02-18T00:00:53Z

src/tir/transforms/primfunc_utils.cc

@@ -30,6 +30,12 @@ namespace tir {
 namespace transform {
 transform::Pass BindTarget(Target target) {
  auto fpass = [target](tir::PrimFunc f, IRModule m, transform::PassContext ctx) {
+    if (f->GetAttr<Integer>(tvm::tir::attr::kIsHostFunc) == 1) {
+      return WithAttrs(std::move(f), Map<String, ObjectRef>{
+                       {tvm::attr::kTarget, target->host.value_or(Target("llvm"))},


Not sure if this is the best option when target host is not available. This is my impression on the default target host.

zxybazh · 2023-02-18T00:02:12Z

Thanks for the careful review and discussion. I've removed duplicate changes and created a unittest that checks target and host func attribute. Please take another look :)

tests/python/unittest/test_tir_host_func.py

tqchen

One final nit

tests/python/unittest/test_tir_host_func.py

include/tvm/tir/function.h

tests/python/unittest/test_tir_host_func.py

junrushao · 2023-02-18T05:10:12Z

src/tir/transforms/primfunc_utils.cc

@@ -30,6 +30,10 @@ namespace tir {
 namespace transform {
 transform::Pass BindTarget(Target target) {
  auto fpass = [target](tir::PrimFunc f, IRModule m, transform::PassContext ctx) {
+    if (f->GetAttr<Integer>(tvm::tir::attr::kIsHostFunc) == 1) {
+      return WithAttr(std::move(WithoutAttr(std::move(f), tvm::tir::attr::kIsHostFunc)),
+                      tvm::attr::kTarget, target->host.value_or(Target("llvm")));


Is there a case where the target host is None?

Yes, when we use target tags like nvidia/geforce-rtx-3070 the default target host is None.

got it. that makes sense!

junrushao

LGTM. Feel free to merge it in once the CI is green!

Update to use the `tvm::tir::IsHostFunc` utility function, rather than the `kIsHostFunc` attribute. Per discussion on apache#14020, the `kIsHostFunct` attribute should only be used in `BindTarget`, and should not be re-introduced in `SplitHostDevice`.

Per discussion on apache#14020, the `kIsHostFunct` attribute should only be used in `BindTarget`, and should not be re-introduced in `SplitHostDevice`.

Update to use the `tvm::tir::IsHostFunc` utility function, rather than the `kIsHostFunc` attribute. Per discussion on apache#14020, the `kIsHostFunct` attribute should only be used in `BindTarget`, and should not be re-introduced in `SplitHostDevice`.

Per discussion on apache#14020, the `kIsHostFunct` attribute should only be used in `BindTarget`, and should not be re-introduced in `SplitHostDevice`.

Update to use the `tvm::tir::IsHostFunc` utility function, rather than the `kIsHostFunc` attribute. Per discussion on apache#14020, the `kIsHostFunct` attribute should only be used in `BindTarget`, and should not be re-introduced in `SplitHostDevice`.

Per discussion on apache#14020, the `kIsHostFunct` attribute should only be used in `BindTarget`, and should not be re-introduced in `SplitHostDevice`.

Update to use the `tvm::tir::IsHostFunc` utility function, rather than the `kIsHostFunc` attribute. Per discussion on apache#14020, the `kIsHostFunct` attribute should only be used in `BindTarget`, and should not be re-introduced in `SplitHostDevice`.

This PR refactors SplitHostDevice into three separate transformations. Previously, SplitHostDevice would replace device regions with a builtin::tvm_call_packed() node to replace the extracted region. After this PR, this process is performed in three separate steps. AnnotateDeviceRegion: Annotate the regions that should be executed on another target. SplitHostDevice: Extract the annotated region into an independent PrimFunc, with a GlobalVar to represent the call from into the new subroutine. LowerDeviceKernelLaunch: For any subroutine call where the caller and callee are on different devices, replace with a device kernel launch. * PR#14915 [TVMScript] Allow T.target("device", host="host") in TVMScript Prior to this commit, the `TargetNode::host` could be specified in TVMScript as part of the config dictionary, under the key `"host"`. However, this required all other device parameters to be explicitly specified, rather than using any of the short-hand string representations. This commit forwards the `host` argument from TVMScript's `T.target` method to `tvm.target.Target`, allowing both the device and host to be specified using the shorthand string representation. ```python @T.prim_func def before_this_commit(): T.func_attr( { "target": T.target( { "arch": "sm_86", "host": {"keys": ["cpu"], "kind": "llvm", "tag": ""}, "keys": ["cuda", "gpu"], "kind": "cuda", "max_num_threads": 1024, "tag": "", "thread_warp_size": 32, } ) } ) T.evaluate(0) @T.prim_func def after_this_commit(): T.func_attr({"target": T.target("cuda", host="llvm")}) T.evaluate(0) ``` * [Target] Added WithoutHost method * [TIR] SplitHostDevice, handle missing kGlobalSymbol Previously, the symbol name of the extracted compute kernel was defined based on the `kGlobalSymbol` attribute, which was required to be present. This commit updates `SplitHostDevice` to generate the symbol name using `kGlobalSymbol` if present, and to fall back to the name of the `tvm::GlobalVar` for internal functions. * [TIR] Refactor SplitHostDevice into three separate passes First pass, `AnnotateDeviceRegions`. This pass decides which portions of a PrimFunc should be run on the device, and annotates them with `kTarget` attribute, indicating which target should be used for later lowering steps. Second pass, `SplitHostDevice`. This pass extracts the annotated region into an independent PrimFunc. The `kTarget` attribute of the extracted kernel is defined by the `kTarget` annotation inserted by `AnnotateDeviceRegions`. The host function is marked by the `tvm::tir::attr::kIsHostFunc` attribute, allowing it to be recognized by later host-only lowering passes. Third pass, `LowerDeviceKernelLaunch`. This pass identifies subroutine calls that call into device kernels, and rewrites them into `T.tvm_call_packed`. * Add unit tests specifically for SplitHostDevice behavior * Added unit test specifically for AnnotateDeviceRegions * Added unit tests for LowerDeviceKernelLaunch * Minor cleanup, moved all kernel launch collection into one spot Previously, the SplitHostDevice pass added the `tir::attr::kKernelLaunchParams` attribute, and the LowerDeviceKernelLaunch pass filled in the values for it. This cleanup makes the kernel launch params be the sole responsibility of LowerDeviceKernelLaunch. * Updated unit tests for LowerWarpMemory * Updated unit tests for ThreadSync * Updated unit test for inject ptx async copy * [Bugfix] Avoid symbol conflicts in MakePackedAPI/MakeUnpackedAPI PRs #14913 and #14914 made analogous changes to `MakePackedAPI` and `MakeUnpackedAPI` to handle subroutine calls. Both PRs introduced the same symbol, `tvm::tir::SubroutineCallRewriter`, a local utility to update internal calls to a modified function. While each PR passed CI individually, and was therefore able to merge, having both changes caused a duplicate symbol. This commit updates `MakePackedAPI` and `MakeUnpackedAPI` to place their local utilities into anonymous namespaces, avoiding the conflict. * Maintain "tir.is_global_func" attr in device-side entry point * SplitHostDevice, update the host-side target to be the host * [TIR] Update LowerDeviceKernelLaunch to avoid kIsHostFunc Update to use the `tvm::tir::IsHostFunc` utility function, rather than the `kIsHostFunc` attribute. Per discussion on #14020, the `kIsHostFunct` attribute should only be used in `BindTarget`, and should not be re-introduced in `SplitHostDevice`. * Remove is_host_func from SplitHostDevice tests

This PR refactors SplitHostDevice into three separate transformations. Previously, SplitHostDevice would replace device regions with a builtin::tvm_call_packed() node to replace the extracted region. After this PR, this process is performed in three separate steps. AnnotateDeviceRegion: Annotate the regions that should be executed on another target. SplitHostDevice: Extract the annotated region into an independent PrimFunc, with a GlobalVar to represent the call from into the new subroutine. LowerDeviceKernelLaunch: For any subroutine call where the caller and callee are on different devices, replace with a device kernel launch. * PR#14915 [TVMScript] Allow T.target("device", host="host") in TVMScript Prior to this commit, the `TargetNode::host` could be specified in TVMScript as part of the config dictionary, under the key `"host"`. However, this required all other device parameters to be explicitly specified, rather than using any of the short-hand string representations. This commit forwards the `host` argument from TVMScript's `T.target` method to `tvm.target.Target`, allowing both the device and host to be specified using the shorthand string representation. ```python @T.prim_func def before_this_commit(): T.func_attr( { "target": T.target( { "arch": "sm_86", "host": {"keys": ["cpu"], "kind": "llvm", "tag": ""}, "keys": ["cuda", "gpu"], "kind": "cuda", "max_num_threads": 1024, "tag": "", "thread_warp_size": 32, } ) } ) T.evaluate(0) @T.prim_func def after_this_commit(): T.func_attr({"target": T.target("cuda", host="llvm")}) T.evaluate(0) ``` * [Target] Added WithoutHost method * [TIR] SplitHostDevice, handle missing kGlobalSymbol Previously, the symbol name of the extracted compute kernel was defined based on the `kGlobalSymbol` attribute, which was required to be present. This commit updates `SplitHostDevice` to generate the symbol name using `kGlobalSymbol` if present, and to fall back to the name of the `tvm::GlobalVar` for internal functions. * [TIR] Refactor SplitHostDevice into three separate passes First pass, `AnnotateDeviceRegions`. This pass decides which portions of a PrimFunc should be run on the device, and annotates them with `kTarget` attribute, indicating which target should be used for later lowering steps. Second pass, `SplitHostDevice`. This pass extracts the annotated region into an independent PrimFunc. The `kTarget` attribute of the extracted kernel is defined by the `kTarget` annotation inserted by `AnnotateDeviceRegions`. The host function is marked by the `tvm::tir::attr::kIsHostFunc` attribute, allowing it to be recognized by later host-only lowering passes. Third pass, `LowerDeviceKernelLaunch`. This pass identifies subroutine calls that call into device kernels, and rewrites them into `T.tvm_call_packed`. * Add unit tests specifically for SplitHostDevice behavior * Added unit test specifically for AnnotateDeviceRegions * Added unit tests for LowerDeviceKernelLaunch * Minor cleanup, moved all kernel launch collection into one spot Previously, the SplitHostDevice pass added the `tir::attr::kKernelLaunchParams` attribute, and the LowerDeviceKernelLaunch pass filled in the values for it. This cleanup makes the kernel launch params be the sole responsibility of LowerDeviceKernelLaunch. * Updated unit tests for LowerWarpMemory * Updated unit tests for ThreadSync * Updated unit test for inject ptx async copy * [Bugfix] Avoid symbol conflicts in MakePackedAPI/MakeUnpackedAPI PRs apache#14913 and apache#14914 made analogous changes to `MakePackedAPI` and `MakeUnpackedAPI` to handle subroutine calls. Both PRs introduced the same symbol, `tvm::tir::SubroutineCallRewriter`, a local utility to update internal calls to a modified function. While each PR passed CI individually, and was therefore able to merge, having both changes caused a duplicate symbol. This commit updates `MakePackedAPI` and `MakeUnpackedAPI` to place their local utilities into anonymous namespaces, avoiding the conflict. * Maintain "tir.is_global_func" attr in device-side entry point * SplitHostDevice, update the host-side target to be the host * [TIR] Update LowerDeviceKernelLaunch to avoid kIsHostFunc Update to use the `tvm::tir::IsHostFunc` utility function, rather than the `kIsHostFunc` attribute. Per discussion on apache#14020, the `kIsHostFunct` attribute should only be used in `BindTarget`, and should not be re-introduced in `SplitHostDevice`. * Remove is_host_func from SplitHostDevice tests

zxybazh added 2 commits February 16, 2023 18:10

Support kIsHostFunc.

e0a687c

Add unit test.

de3513b

Revert verify memory pass.

8df9ff6

tqchen reviewed Feb 17, 2023

View reviewed changes

src/tir/transforms/split_host_device.cc Outdated Show resolved Hide resolved

tqchen reviewed Feb 17, 2023

View reviewed changes

tests/python/unittest/test_tir_host_func.py Outdated Show resolved Hide resolved

zxybazh added 2 commits February 17, 2023 15:40

Address comments.

7415590

Make sure is cleared.

ca9e97b

zxybazh commented Feb 18, 2023

View reviewed changes

Fix linting.

235f214

tqchen reviewed Feb 18, 2023

View reviewed changes

tests/python/unittest/test_tir_host_func.py Outdated Show resolved Hide resolved

tqchen approved these changes Feb 18, 2023

View reviewed changes

tqchen reviewed Feb 18, 2023

View reviewed changes

tests/python/unittest/test_tir_host_func.py Outdated Show resolved Hide resolved

tqchen reviewed Feb 18, 2023

View reviewed changes

include/tvm/tir/function.h Show resolved Hide resolved

tqchen requested changes Feb 18, 2023

View reviewed changes

tests/python/unittest/test_tir_host_func.py Show resolved Hide resolved

zxybazh added 3 commits February 17, 2023 19:01

Remove target attribute.

54dab40

Make attributesmutually exclusive.

d6dea2d

Remove unnecessary attribute.

54cb5a3

tqchen approved these changes Feb 18, 2023

View reviewed changes

junrushao reviewed Feb 18, 2023

View reviewed changes

junrushao approved these changes Feb 18, 2023

View reviewed changes

zxybazh merged commit 8613c79 into apache:main Feb 18, 2023

zxybazh mentioned this pull request Feb 22, 2023

[Unity][Relax] Set Shape Function to Be Host Function #14090

Merged

yongwww pushed a commit to yongwww/tvm that referenced this pull request Feb 27, 2023

[TIR] Enable Host Func Attribute for PrimFunc (apache#14020)

caf7aa1

ysh329 mentioned this pull request Apr 17, 2023

[Release] v0.12.0 Release Candidate Notes #14645

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TIR] Enable Host Func Attribute for PrimFunc #14020

[TIR] Enable Host Func Attribute for PrimFunc #14020

zxybazh commented Feb 17, 2023

tvm-bot commented Feb 17, 2023

junrushao commented Feb 17, 2023

zxybazh commented Feb 17, 2023

junrushao commented Feb 17, 2023

zxybazh commented Feb 17, 2023

tqchen commented Feb 17, 2023 •

edited

Loading

junrushao commented Feb 17, 2023

zxybazh Feb 18, 2023

zxybazh commented Feb 18, 2023

tqchen left a comment

junrushao Feb 18, 2023

zxybazh Feb 18, 2023

junrushao Feb 18, 2023

junrushao left a comment

[TIR] Enable Host Func Attribute for PrimFunc #14020

[TIR] Enable Host Func Attribute for PrimFunc #14020

Conversation

zxybazh commented Feb 17, 2023

tvm-bot commented Feb 17, 2023

junrushao commented Feb 17, 2023

zxybazh commented Feb 17, 2023

junrushao commented Feb 17, 2023

zxybazh commented Feb 17, 2023

tqchen commented Feb 17, 2023 • edited Loading

junrushao commented Feb 17, 2023

zxybazh Feb 18, 2023

Choose a reason for hiding this comment

zxybazh commented Feb 18, 2023

tqchen left a comment

Choose a reason for hiding this comment

junrushao Feb 18, 2023

Choose a reason for hiding this comment

zxybazh Feb 18, 2023

Choose a reason for hiding this comment

junrushao Feb 18, 2023

Choose a reason for hiding this comment

junrushao left a comment

Choose a reason for hiding this comment

tqchen commented Feb 17, 2023 •

edited

Loading