Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TIR] Enable Host Func Attribute for PrimFunc #14020

Merged
merged 9 commits into from
Feb 18, 2023

Conversation

zxybazh
Copy link
Member

@zxybazh zxybazh commented Feb 17, 2023

This PR enables a new attribute kIsHostFunc to ensure certrain prim func is run on CPU, for example shape_func that computes shape information dynamically. With the new attribute, the primfunc will be skipped in verification pass and split host device pass. A unit test is added.

CC: @sunggg @YuchenJin @tqchen

@tvm-bot
Copy link
Collaborator

tvm-bot commented Feb 17, 2023

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

@junrushao
Copy link
Member

Hey Xiyou,

My understanding of VerifyMemory is that it checks the illegal memory access, for example, CPU code directly access GPU memory.

I probably don't have much context on this but was curious:

  • Q1. in which case, VerifyMemory fails on a shape function that is run on CPU, given there is no GPU array as inputs of the PrimFunc in shape functions?
  • Q2. in which case, given the target has been attached in PrimFunc as attributes, do we have to add a new attribute because target is not enough?

@zxybazh
Copy link
Member Author

zxybazh commented Feb 17, 2023

Hi, thanks for checking my PR this late! Very good questions!

Let me share some of the context here. We are trying to support a dynamic shape operator on Cuda. This function is generated during a relax pass called VMShapeLower which is part of the relax build. And it will generate a primfunc as follows:

@T.prim_func
def shape_func(H: T.Buffer((T.int64(3),), "int64")):
    T.func_attr({"global_symbol": "shape_func"})
    H[T.int64(2)] = T.int64(4) * H[T.int64(0)] * H[T.int64(1)]

Apparently, it's supposed to be running on CPU, i.e, the host instead of the device. However, since this pass doesn't have access to the target information, when the function is generated it doesn't include the target in its attribute. Therefore, we would like to add an attribute to automatically bind it to the target host in BindTarget and avoid it being splited into device code.

For Q1, It does not fail because this pass is after BindTarget. Thanks for the tip! Will remove this change in verify memory pass. Has reverted the change.
For Q2, given the context, IMHO if we can access target information in the pass and do target binding, it's possible to avoid this new attribute. I'm not quite sure if it's expected to add target as argument for certain pass.

@junrushao
Copy link
Member

For Q2, given the context, IMHO if we can access target information in the pass and do target binding, it's possible to avoid this new attribute. I'm not quite sure if it's expected to add target as argument for certain pass.

I'm not sure if I'm missing anything, but I do think the VerifyMemory pass assumes the target information is always available:

auto target = func->GetAttr<Target>(tvm::attr::kTarget);
ICHECK(target.defined()) << "VerifyMemory: Require the target attribute";
VLOG(1) << "verifying memory for target '" << target.value()->str()
<< "' for primitive:" << std::endl
<< func;

@zxybazh
Copy link
Member Author

zxybazh commented Feb 17, 2023

Yes, it's available in VerifyMemory pass and I've reverted the change in this pass. For Q2 I was refering to VMShapeLower pass in relax build, where this primfunc is generated.

@tqchen
Copy link
Member

tqchen commented Feb 17, 2023

Thanks @zxybazh . After looking at the discussions especially inputs from junru. I think it would be great to clarify that we want is_host_func and target attr to be mutually exclusive to each other and only being used by BindTarget.

After explicit target being attached to the function then such attr is no longer necessary and can be a source of duplication.

So we only need changes for BindTarget here and possibly a UT that pass only

@junrushao
Copy link
Member

Thanks @zxybazh @tqchen for the clarification - this has been much clear to me now! Let's do the following change:

  • In BindTarget, if we have tvm::tir::attr::kIsHostFunc flag set up, bind the target host instead and also remove the kIsHostFunc flag in the meantime.
  • Remove the logic in SplitHostDevice

@@ -30,6 +30,12 @@ namespace tir {
namespace transform {
transform::Pass BindTarget(Target target) {
auto fpass = [target](tir::PrimFunc f, IRModule m, transform::PassContext ctx) {
if (f->GetAttr<Integer>(tvm::tir::attr::kIsHostFunc) == 1) {
return WithAttrs(std::move(f), Map<String, ObjectRef>{
{tvm::attr::kTarget, target->host.value_or(Target("llvm"))},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is the best option when target host is not available. This is my impression on the default target host.

@zxybazh
Copy link
Member Author

zxybazh commented Feb 18, 2023

Thanks for the careful review and discussion. I've removed duplicate changes and created a unittest that checks target and host func attribute. Please take another look :)

Copy link
Member

@tqchen tqchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One final nit

@@ -30,6 +30,10 @@ namespace tir {
namespace transform {
transform::Pass BindTarget(Target target) {
auto fpass = [target](tir::PrimFunc f, IRModule m, transform::PassContext ctx) {
if (f->GetAttr<Integer>(tvm::tir::attr::kIsHostFunc) == 1) {
return WithAttr(std::move(WithoutAttr(std::move(f), tvm::tir::attr::kIsHostFunc)),
tvm::attr::kTarget, target->host.value_or(Target("llvm")));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a case where the target host is None?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, when we use target tags like nvidia/geforce-rtx-3070 the default target host is None.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it. that makes sense!

Copy link
Member

@junrushao junrushao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Feel free to merge it in once the CI is green!

yongwww pushed a commit to yongwww/tvm that referenced this pull request Feb 27, 2023
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request May 24, 2023
Update to use the `tvm::tir::IsHostFunc` utility function, rather than
the `kIsHostFunc` attribute.  Per discussion on
apache#14020, the `kIsHostFunct` attribute
should only be used in `BindTarget`, and should not be re-introduced
in `SplitHostDevice`.
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request May 24, 2023
Per discussion on apache#14020, the
`kIsHostFunct` attribute should only be used in `BindTarget`, and
should not be re-introduced in `SplitHostDevice`.
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request May 25, 2023
Update to use the `tvm::tir::IsHostFunc` utility function, rather than
the `kIsHostFunc` attribute.  Per discussion on
apache#14020, the `kIsHostFunct` attribute
should only be used in `BindTarget`, and should not be re-introduced
in `SplitHostDevice`.
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request May 25, 2023
Per discussion on apache#14020, the
`kIsHostFunct` attribute should only be used in `BindTarget`, and
should not be re-introduced in `SplitHostDevice`.
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request May 25, 2023
Update to use the `tvm::tir::IsHostFunc` utility function, rather than
the `kIsHostFunc` attribute.  Per discussion on
apache#14020, the `kIsHostFunct` attribute
should only be used in `BindTarget`, and should not be re-introduced
in `SplitHostDevice`.
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request May 25, 2023
Per discussion on apache#14020, the
`kIsHostFunct` attribute should only be used in `BindTarget`, and
should not be re-introduced in `SplitHostDevice`.
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request May 25, 2023
Update to use the `tvm::tir::IsHostFunc` utility function, rather than
the `kIsHostFunc` attribute.  Per discussion on
apache#14020, the `kIsHostFunct` attribute
should only be used in `BindTarget`, and should not be re-introduced
in `SplitHostDevice`.
csullivan pushed a commit that referenced this pull request May 26, 2023
This PR refactors SplitHostDevice into three separate transformations. Previously, SplitHostDevice would replace device regions with a builtin::tvm_call_packed() node to replace the extracted region. After this PR, this process is performed in three separate steps.

AnnotateDeviceRegion: Annotate the regions that should be executed on another target.
SplitHostDevice: Extract the annotated region into an independent PrimFunc, with a GlobalVar to represent the call from into the new subroutine.
LowerDeviceKernelLaunch: For any subroutine call where the caller and callee are on different devices, replace with a device kernel launch.

* PR#14915 [TVMScript] Allow T.target("device", host="host") in TVMScript

Prior to this commit, the `TargetNode::host` could be specified in
TVMScript as part of the config dictionary, under the key `"host"`.
However, this required all other device parameters to be explicitly
specified, rather than using any of the short-hand string
representations.  This commit forwards the `host` argument from TVMScript's
`T.target` method to `tvm.target.Target`, allowing both the device and
host to be specified using the shorthand string representation.

```python
@T.prim_func
def before_this_commit():
    T.func_attr(
        {
            "target": T.target(
                {
                    "arch": "sm_86",
                    "host": {"keys": ["cpu"], "kind": "llvm", "tag": ""},
                    "keys": ["cuda", "gpu"],
                    "kind": "cuda",
                    "max_num_threads": 1024,
                    "tag": "",
                    "thread_warp_size": 32,
                }
            )
        }
    )
    T.evaluate(0)

@T.prim_func
def after_this_commit():
    T.func_attr({"target": T.target("cuda", host="llvm")})
    T.evaluate(0)
```

* [Target] Added WithoutHost method

* [TIR] SplitHostDevice, handle missing kGlobalSymbol

Previously, the symbol name of the extracted compute kernel was
defined based on the `kGlobalSymbol` attribute, which was required to
be present.  This commit updates `SplitHostDevice` to generate the
symbol name using `kGlobalSymbol` if present, and to fall back to the
name of the `tvm::GlobalVar` for internal functions.

* [TIR] Refactor SplitHostDevice into three separate passes

First pass, `AnnotateDeviceRegions`.  This pass decides which portions
of a PrimFunc should be run on the device, and annotates them with
`kTarget` attribute, indicating which target should be used for later
lowering steps.

Second pass, `SplitHostDevice`.  This pass extracts the annotated
region into an independent PrimFunc.  The `kTarget` attribute of the
extracted kernel is defined by the `kTarget` annotation inserted by
`AnnotateDeviceRegions`.  The host function is marked by the
`tvm::tir::attr::kIsHostFunc` attribute, allowing it to be recognized
by later host-only lowering passes.

Third pass, `LowerDeviceKernelLaunch`.  This pass identifies
subroutine calls that call into device kernels, and rewrites them into
`T.tvm_call_packed`.

* Add unit tests specifically for SplitHostDevice behavior

* Added unit test specifically for AnnotateDeviceRegions

* Added unit tests for LowerDeviceKernelLaunch

* Minor cleanup, moved all kernel launch collection into one spot

Previously, the SplitHostDevice pass added the
`tir::attr::kKernelLaunchParams` attribute, and the
LowerDeviceKernelLaunch pass filled in the values for it.  This
cleanup makes the kernel launch params be the sole responsibility of
LowerDeviceKernelLaunch.

* Updated unit tests for LowerWarpMemory

* Updated unit tests for ThreadSync

* Updated unit test for inject ptx async copy

* [Bugfix] Avoid symbol conflicts in MakePackedAPI/MakeUnpackedAPI

PRs #14913 and
#14914 made analogous changes to
`MakePackedAPI` and `MakeUnpackedAPI` to handle subroutine calls.
Both PRs introduced the same symbol,
`tvm::tir::SubroutineCallRewriter`, a local utility to update internal
calls to a modified function.  While each PR passed CI individually,
and was therefore able to merge, having both changes caused a
duplicate symbol.

This commit updates `MakePackedAPI` and `MakeUnpackedAPI` to place
their local utilities into anonymous namespaces, avoiding the
conflict.

* Maintain "tir.is_global_func" attr in device-side entry point

* SplitHostDevice, update the host-side target to be the host

* [TIR] Update LowerDeviceKernelLaunch to avoid kIsHostFunc

Update to use the `tvm::tir::IsHostFunc` utility function, rather than
the `kIsHostFunc` attribute.  Per discussion on
#14020, the `kIsHostFunct` attribute
should only be used in `BindTarget`, and should not be re-introduced
in `SplitHostDevice`.

* Remove is_host_func from SplitHostDevice tests
mei-ye pushed a commit to mei-ye/tvm that referenced this pull request Jun 1, 2023
This PR refactors SplitHostDevice into three separate transformations. Previously, SplitHostDevice would replace device regions with a builtin::tvm_call_packed() node to replace the extracted region. After this PR, this process is performed in three separate steps.

AnnotateDeviceRegion: Annotate the regions that should be executed on another target.
SplitHostDevice: Extract the annotated region into an independent PrimFunc, with a GlobalVar to represent the call from into the new subroutine.
LowerDeviceKernelLaunch: For any subroutine call where the caller and callee are on different devices, replace with a device kernel launch.

* PR#14915 [TVMScript] Allow T.target("device", host="host") in TVMScript

Prior to this commit, the `TargetNode::host` could be specified in
TVMScript as part of the config dictionary, under the key `"host"`.
However, this required all other device parameters to be explicitly
specified, rather than using any of the short-hand string
representations.  This commit forwards the `host` argument from TVMScript's
`T.target` method to `tvm.target.Target`, allowing both the device and
host to be specified using the shorthand string representation.

```python
@T.prim_func
def before_this_commit():
    T.func_attr(
        {
            "target": T.target(
                {
                    "arch": "sm_86",
                    "host": {"keys": ["cpu"], "kind": "llvm", "tag": ""},
                    "keys": ["cuda", "gpu"],
                    "kind": "cuda",
                    "max_num_threads": 1024,
                    "tag": "",
                    "thread_warp_size": 32,
                }
            )
        }
    )
    T.evaluate(0)

@T.prim_func
def after_this_commit():
    T.func_attr({"target": T.target("cuda", host="llvm")})
    T.evaluate(0)
```

* [Target] Added WithoutHost method

* [TIR] SplitHostDevice, handle missing kGlobalSymbol

Previously, the symbol name of the extracted compute kernel was
defined based on the `kGlobalSymbol` attribute, which was required to
be present.  This commit updates `SplitHostDevice` to generate the
symbol name using `kGlobalSymbol` if present, and to fall back to the
name of the `tvm::GlobalVar` for internal functions.

* [TIR] Refactor SplitHostDevice into three separate passes

First pass, `AnnotateDeviceRegions`.  This pass decides which portions
of a PrimFunc should be run on the device, and annotates them with
`kTarget` attribute, indicating which target should be used for later
lowering steps.

Second pass, `SplitHostDevice`.  This pass extracts the annotated
region into an independent PrimFunc.  The `kTarget` attribute of the
extracted kernel is defined by the `kTarget` annotation inserted by
`AnnotateDeviceRegions`.  The host function is marked by the
`tvm::tir::attr::kIsHostFunc` attribute, allowing it to be recognized
by later host-only lowering passes.

Third pass, `LowerDeviceKernelLaunch`.  This pass identifies
subroutine calls that call into device kernels, and rewrites them into
`T.tvm_call_packed`.

* Add unit tests specifically for SplitHostDevice behavior

* Added unit test specifically for AnnotateDeviceRegions

* Added unit tests for LowerDeviceKernelLaunch

* Minor cleanup, moved all kernel launch collection into one spot

Previously, the SplitHostDevice pass added the
`tir::attr::kKernelLaunchParams` attribute, and the
LowerDeviceKernelLaunch pass filled in the values for it.  This
cleanup makes the kernel launch params be the sole responsibility of
LowerDeviceKernelLaunch.

* Updated unit tests for LowerWarpMemory

* Updated unit tests for ThreadSync

* Updated unit test for inject ptx async copy

* [Bugfix] Avoid symbol conflicts in MakePackedAPI/MakeUnpackedAPI

PRs apache#14913 and
apache#14914 made analogous changes to
`MakePackedAPI` and `MakeUnpackedAPI` to handle subroutine calls.
Both PRs introduced the same symbol,
`tvm::tir::SubroutineCallRewriter`, a local utility to update internal
calls to a modified function.  While each PR passed CI individually,
and was therefore able to merge, having both changes caused a
duplicate symbol.

This commit updates `MakePackedAPI` and `MakeUnpackedAPI` to place
their local utilities into anonymous namespaces, avoiding the
conflict.

* Maintain "tir.is_global_func" attr in device-side entry point

* SplitHostDevice, update the host-side target to be the host

* [TIR] Update LowerDeviceKernelLaunch to avoid kIsHostFunc

Update to use the `tvm::tir::IsHostFunc` utility function, rather than
the `kIsHostFunc` attribute.  Per discussion on
apache#14020, the `kIsHostFunct` attribute
should only be used in `BindTarget`, and should not be re-introduced
in `SplitHostDevice`.

* Remove is_host_func from SplitHostDevice tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants