[Relax] Implement R.ensure_zero_offset and update memory planning for R.view #17145

vinx13 · 2024-07-09T23:03:18Z

Previously, R.view was legalized to extern call to runtime.TVMArrayCreateView during LegalizeOps. This call to extern func can't be properly handled by StaticBlockPlanMemory because it assumes the extern func does not retain the input buffer. Extern func returning a view of the input would break the ref count of the buffer. This PR defers the legalization of R.view so that it can be explicitly handled by memory planning.

A new op R.ensure_aligned is added as discussed in #16955

cc @tqchen @yongwww @Lunderberg

Lunderberg

Thank you for making this follow-up. I have a couple of straight

It looks like the removal of the legalization for R.memory.view is to avoid a phase-order issue, where StaticPlanBlockMemory must be able to identify operators that may alias. Is that understanding correct?

Rather than moving some of the legalization steps into LowerVMBuiltin, I propose we instead add a legalization_level for each operator, and to LegalizeOps. That way, we can distinguish between higher-abstraction operators (legalize before StaticPlanBlockMemory) and lower-abstraction operators (legalize after StaticPlanBlockMemory).

If not specified, an operator would have legalization level of 10. The R.memory.view and R.memory.ensure_aligned operators would have legalization level of 0.
LegalizeOps would default to a legalization level of 10. Any operator whose legalization level is less than the LegalizeOps level would be skipped.
An additional pass of LegalizeOps would occur at the end of the Relax lowering pipeline, with legalization level of zero.

python/tvm/relax/op/memory/view.py

src/relax/backend/vm/vm_builtin_lower.cc

Lunderberg · 2024-07-10T14:42:40Z

src/relax/op/memory/view.cc


-  return Call(runtime_view_func, {data, shape, dtype, relative_byte_offset});
+  return Call(call->op, {data, shape, dtype, relative_byte_offset});


This change means that R.memory.view is still present in the output of LegalizeOps, but a legalization function should replace the operator with a lowered form.

This only does the inference of void type args and leave the lowering to the later pass.

I don't think we want to do the inference of the shape/dtype prior to lowering, because it could result in unexpected StructInfo inference later on.

Suppose we have R.memory.view(arg, shape=[16]). This returns a view into the first 16 elements of arg, without changing the dtype. If an IRModule pass updates the datatype of arg, then that new datatype should also propagate to the view. However, legalizing it to R.memory.view(arg, shape=[16], dtype="float16") would return a view into arg, interpreting the first 32 bytes as if they were "float16". Now, if an IRModule pass updates the datatype of arg, the view would still be "float16". To avoid this issue, the unknown arguments shouldn't be filled in until the lowering is about to occur.

What if we were to remove .set_attr<FLegalize> altogether, and only have .set_attr<FLowerBuiltin>? That way, we preserve the R.memory.view as-is until it is ready to be lowered. The LegalizeOps pass would then be a no-op for R.memory.view, and only the LowerRuntimeBuiltin pass would change it at all.

src/relax/op/memory/view.cc

src/relax/transform/static_plan_block_memory.cc

src/runtime/relax_vm/builtin.cc

tests/python/relax/test_op_view.py

Lunderberg · 2024-07-10T15:09:42Z

Also, if you're interested, I have a partial implementation in this dev branch that includes the device-type validation and a TIR legalization. If you'd like to pull any of it over, you're welcome to it, as I've had it on the back-burner for far too long.

tqchen · 2024-07-10T16:29:37Z

Based on the current grouping, seems quite a bit of the runtime function dispatchings happens in LowerBuiltin, while legalizeOps primarily focused on lowering to TIR related functions.

I think such distinction is still helpful, so that can be a factor considering moving the view legalization into the VMBuiltin.

Lunderberg · 2024-07-10T20:28:35Z

Based on the current grouping, seems quite a bit of the runtime function dispatchings happens in LowerBuiltin, while legalizeOps primarily focused on lowering to TIR related functions.

I don't think distinguishing between the style of implementation is a useful distinction to make. The important distinction is what functionality must still be observable outside of the operator, not the functional form of the legalized expression.

My understanding is that LegalizeOps is for anything that can be lowered independent of the context in which it appears, and VMBuiltinLower is for operators that require some non-local context (e.g. the VM context pointer) in order to be lowered.

tqchen · 2024-07-10T22:51:52Z

There are different ways to look at this particular case. For this particular case, given the view was lowered to runtime function, it was primarily focused for the VM itself. One can also envision in future we have a codegen approach to get a view function that get inlined which is not needed in the VM approach.

Introducing legalize ops with different levels can also have extra issues, as we need to run default scheduling of some of the ops. But for the certain legalization level we do not have to. In some sense, we are creating different grouping here.

Perhaps one way to make it more clear is to rename LowerVMBuiltin to LowerRuntimeBuiltin, which have clear indication that that is a pass which takes charge of lowering all implementaitons of runtime builtin functions.

Lunderberg · 2024-07-11T15:46:31Z

Perhaps one way to make it more clear is to rename LowerVMBuiltin to LowerRuntimeBuiltin, which have clear indication that that is a pass which takes charge of lowering all implementaitons of runtime builtin functions.

I like this idea, but I don't think we should move the definition of the legalized form into the LowerRuntimeBuiltin. What if we were to instead add a new attribute, which has the same signature as FLegalize, but would be applied at the later point. This would allow LowerRuntimeBuiltin to replace anything that has the FLowerBuiltin attribute, and wouldn't require a distinction between different levels of FLegalize.

That would also allow FLowerBuiltin to only run after ToNonDataflow, and to be implemented in terms of impure functions. By constrast, since FLegalize may replace a call within a dataflow block, the implementation cannot be in terms of an impure call.

One can also envision in future we have a codegen approach to get a view function that get inlined which is not needed in the VM approach.

I like this idea, and have been toying around with some TIR implementations. The key limitation at the moment is the inability to construct and return a new NDArray if required. (Similar to the difficulties in returning a string that are blocking #16836 and #17103.)

tqchen · 2024-07-11T17:10:49Z

I like this idea, but I don't think we should move the definition of the legalized form into the LowerRuntimeBuiltin. What if we were to instead add a new attribute, which has the same signature as FLegalize, but would be applied at the later point. This would allow LowerRuntimeBuiltin to replace anything that has the FLowerBuiltin attribute, and wouldn't require a distinction between different levels of FLegalize.

I think having FLowerBuiltin builtin attribute is great. lets go with that

python/tvm/relax/transform/transform.py

src/relax/op/memory/view.cc

Lunderberg · 2024-07-16T15:39:20Z

src/relax/op/memory/view.cc


-  return Call(runtime_view_func, {data, shape, dtype, relative_byte_offset});
+  return Call(call->op, {data, shape, dtype, relative_byte_offset});


I don't think we want to do the inference of the shape/dtype prior to lowering, because it could result in unexpected StructInfo inference later on.

Suppose we have R.memory.view(arg, shape=[16]). This returns a view into the first 16 elements of arg, without changing the dtype. If an IRModule pass updates the datatype of arg, then that new datatype should also propagate to the view. However, legalizing it to R.memory.view(arg, shape=[16], dtype="float16") would return a view into arg, interpreting the first 32 bytes as if they were "float16". Now, if an IRModule pass updates the datatype of arg, the view would still be "float16". To avoid this issue, the unknown arguments shouldn't be filled in until the lowering is about to occur.

What if we were to remove .set_attr<FLegalize> altogether, and only have .set_attr<FLowerBuiltin>? That way, we preserve the R.memory.view as-is until it is ready to be lowered. The LegalizeOps pass would then be a no-op for R.memory.view, and only the LowerRuntimeBuiltin pass would change it at all.

Lunderberg · 2024-07-16T15:42:08Z

src/relax/transform/static_plan_block_memory.cc

@@ -286,8 +286,13 @@ class TokenAllocator1D {
  std::vector<StorageToken> full_pool_;
 };

-/*! \brief Check if the input op is "relax.reshape". */
-bool IsReshape(const Expr& op) { return op.same_as(Op::Get("relax.reshape")); }
+/*! \brief Check if the input op is a memory op that may return the same buffer. */


Thank you on the updated docstring. As I'm looking at it, we may want to add this as another operator attribute (e.g. .set_attr<Bool>("ReturnMayAliasArgument", Bool(true))), but that could be a follow-up PR instead.

Lunderberg · 2024-07-16T15:52:02Z

include/tvm/runtime/device_api.h

@@ -240,6 +240,10 @@ class TVM_DLL DeviceAPI {
    return device_type != kDLCPU && device_type != kDLMicroDev;
  }

+  static bool SupportsPointerArithmetics(int device_type) {


Since we already have a vtable for DeviceAPI, this should be a virtual function instead of a static boolean. That would allow individual DeviceAPI implementations to independently mark that they support the pointer-arithmetic. (It would also allow checking for driver-dependent support, such as vulkan support for the optional VK_KHR_buffer_device_address feature.)

Since host-side pointer arithmetic is not the default behavior for DLTensor::data, the default implementation in DeviceAPI would return false, and it could be overridden in CPUDeviceAPI and CUDADeviceAPI to return true.

Also, a nitpick: This isn't whether the device supports pointer arithmetic, but whether pointer arithmetic of a device-owned void* DLTensor::data may be performed on the host. The TVM backends for both Vulkan and OpenCL support pointer arithmetic, but only within the generated kernels. Neither support pointer arithmetic being performed on the host.

src/runtime/relax_vm/builtin.cc

Lunderberg · 2024-07-16T15:59:54Z

tests/python/relax/test_op_view.py

-                R.dtype("float32"),
-                R.prim_value(0),
-            )
+            B = R.memory.view(A, shape=R.shape([64, 64]), dtype="float32", relative_byte_offset=0)
            return B

    After = tvm.relax.transform.LegalizeOps()(Before)


After replacing the .set_attr<FLegalize> for R.memory.view with .set_attr<FLowerBuiltin>, the changes to these unit tests can be reverted. Instead, any use of LegalizeOps in the unit tests would instead call LowerRuntimeBuiltin.

Lunderberg · 2024-08-01T16:13:43Z

@vinx13 I took a look at the current CI failures, and it looks like it pretty close to passing. If you'd like, applying the diff below should resolve the last 4 failing tests in CI.

pr_17145_diff.txt

python/tvm/relax/transform/__init__.py

tqchen · 2024-08-06T12:05:13Z

cc @Lunderberg let us merge this in

Lunderberg

Thank you for making the changes (and for pinging me on it). Looks good!

github-actions bot requested review from Lunderberg, tqchen and yongwww July 9, 2024 23:04

vinx13 force-pushed the feat/view-align branch 2 times, most recently from e2098bc to 094428d Compare July 10, 2024 04:22

Lunderberg requested changes Jul 10, 2024

View reviewed changes

Lunderberg reviewed Jul 16, 2024

View reviewed changes

vinx13 changed the title ~~[Relax] Implement R.ensure_aligned and update memory planning for R.view~~ [Relax] Implement R.ensure_zero_offset and update memory planning for R.view Jul 17, 2024

Lunderberg mentioned this pull request Jul 26, 2024

[Bug] [Relax] InternalError: Check failed: (*it).second == var #17200

Closed

tqchen approved these changes Aug 2, 2024

View reviewed changes

vinx13 added 11 commits August 2, 2024 09:57

[Relax] Implement R.ensure_aligned and update memory planning for R.view

2f5d036

Rename to ensure_zero_offset and LowerRuntimeBuiltin

59fada7

lint

c742a07

Add warnings

c4e1f29

Apply format

d1a3243

Check kAllocAlignment

c931b45

update LegalizeView to no op

25cc462

Make SupportsDevicePointerArithmetics virutal

7ed6b38

fix

cd553b6

lint

0abe71d

apply eric's patch

a07aa4f

vinx13 force-pushed the feat/view-align branch from 2249b60 to a07aa4f Compare August 2, 2024 16:59

Lunderberg reviewed Aug 5, 2024

View reviewed changes

python/tvm/relax/transform/__init__.py Show resolved Hide resolved

fix

d5b1588

Lunderberg approved these changes Aug 6, 2024

View reviewed changes

Lunderberg merged commit 05e2bc3 into apache:main Aug 6, 2024
19 checks passed

ysh329 mentioned this pull request Oct 16, 2024

[Release] v0.18.0 Release Candidate Notes #17468

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Relax] Implement R.ensure_zero_offset and update memory planning for R.view #17145

[Relax] Implement R.ensure_zero_offset and update memory planning for R.view #17145

vinx13 commented Jul 9, 2024

Lunderberg left a comment

Lunderberg Jul 10, 2024

vinx13 Jul 15, 2024

Lunderberg Jul 16, 2024

Lunderberg commented Jul 10, 2024

tqchen commented Jul 10, 2024

Lunderberg commented Jul 10, 2024 •

edited

Loading

tqchen commented Jul 10, 2024 •

edited

Loading

Lunderberg commented Jul 11, 2024

tqchen commented Jul 11, 2024

Lunderberg Jul 16, 2024

Lunderberg Jul 16, 2024

Lunderberg Jul 16, 2024

Lunderberg Jul 16, 2024

Lunderberg Jul 16, 2024

Lunderberg commented Aug 1, 2024

tqchen commented Aug 6, 2024

Lunderberg left a comment


		return Call(runtime_view_func, {data, shape, dtype, relative_byte_offset});
		return Call(call->op, {data, shape, dtype, relative_byte_offset});

[Relax] Implement R.ensure_zero_offset and update memory planning for R.view #17145

[Relax] Implement R.ensure_zero_offset and update memory planning for R.view #17145

Conversation

vinx13 commented Jul 9, 2024

Lunderberg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Lunderberg commented Jul 10, 2024

tqchen commented Jul 10, 2024

Lunderberg commented Jul 10, 2024 • edited Loading

tqchen commented Jul 10, 2024 • edited Loading

Lunderberg commented Jul 11, 2024

tqchen commented Jul 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Lunderberg commented Aug 1, 2024

tqchen commented Aug 6, 2024

Lunderberg left a comment

Choose a reason for hiding this comment

Lunderberg commented Jul 10, 2024 •

edited

Loading

tqchen commented Jul 10, 2024 •

edited

Loading