-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Generalize compute to tensor region #1485
Comments
cc @dmlc/tvm-team |
any update on this? @tqchen |
@ZihengJiang is going to send more updates on the proposal and the API behavior, we will start discussing from there. |
@tqchen @yzhliu @tmoreau89 @merrymercy @cowanmeg welcome to comment |
I like this proposal as it provides better control to the programmer over how tensorization is applied. |
Another thing that is tricky with current tenorization, is error reporting (when the intrinsic does not match the code we apply tensorization on). Do you know if this new approach will lead to clearer error reporting? |
the result's shape from tensor_op is not obvious and inconsistent with compute. |
@b-y-e Because we need to distinguish the axis in tensorized region and others, so |
@tmoreau89 The only thing need to check in the new interface is the shape of tensorized region. It will report an error if they do not match. And the body part would be replaced directly. |
Thanks for all the discussions so far, this is a very important feature, as @xqdan and @ZihengJiang suggested, there are certain use cases for it. Here is a summary of my takes: SignificanceSo far, there is a two-level separation of "expressions", the scalar index expression as in the compute body, and the tensor relation dataflow as in the chains of operators defined by the compute (topi). Ideally, we want to ask: is it possible to also use tensor data type expressions within the compute themselves. Conceptually this offers a sense of "recursion". That is why this proposal is quite significant as it takes a step toward the direction. ProsIt offers an alternative to the current tensorize:
ConsThere are certain drawbacks of abusing this notion vs tensorize
|
My last post list summarizes some of the considerations. Due to the significance of this proposal, we would like to invite the community to have a good discussion on APIs. Specifically, we want to reach the following goals
Current Proposal# tensor_op way
def tensor_op_vadd():
A = tvm.placeholder((m/factor, factor), name="A", dtype=dtype)
B = tvm.placeholder((m/factor, factor), name="B", dtype=dtype)
intrin = intrin_vadd(16)
C = tvm.tensor_op([m/factor,], [factor,],
lambda i: [A[i, 0:factor], B[i, 0:factor]],
intrin, name='C') Mock Design 1The tensor intrinsics declaration is the same in the @ZihengJiang 's example # tensor_op way, via compute
def tensor_op_vadd():
A = tvm.placeholder((m/factor, factor), name="A", dtype=dtype)
B = tvm.placeholder((m/factor, factor), name="B", dtype=dtype)
C = tvm.compute((m/factor, factor),
lambda i: intrin_add(A[i:, 0:factor], B[i, 0:factor])) The key differences are
Mock Design 2# tensor_op way, via compute
def tensor_op_vadd():
A = tvm.placeholder((m/factor, factor), name="A", dtype=dtype)
B = tvm.placeholder((m/factor, factor), name="B", dtype=dtype)
C = tvm.compute((m/factor, factor),
lambda i, j: intrin_add(A[i:, 0:size_of(j)], B[i, 0:size_of(j)])) |
@merrymercy @tmoreau89 @xqdan @cowanmeg @eqy @yzhliu @were @anijain2305 @dmlc/tvm-team please put your weight on the API proposal and technical questions. |
Regarding the concerns that @tqchen raised, would it make sense to keep tensorization as it is, and introduce this new form of tensorization as "encapsulation" or "inlining"? This can maintain our clean separation of compute/scheduling, while introducing a convenient way to inlining high-performance intrinsics in schedules. |
For the point The internal axis is implied via the dimension of the expression , I am doubt that the internal axis can be infered from the expression directly. For example, here is an example for
|
We decide the number of the internal axis by how many axis that are not passed into compute. in your case, because C is supposed to have 4 axis, but only two are passed into compute, so there are two internal axis |
@tqchen As for the implementation, what is the return value of Also, when we do |
I like the 1st one better. |
|
closed by #1476 |
Target:
Motivation
The current
tensorize
method useTensorIntrin
as a contract. While lowering, it will compare the body of the intrinsic‘s declaration and the body of the original declaration. After verifying the both are absolutely identical, the body of the original declaration will be replaced by intrinsic.However, we found that it is not easy to guarantee/express the identity in some situation, especially involving the value of an index. This problem is related to the implementation of
tensorize
: normally,tensorize
will replace the index in intrinsic with the index in the original declaration, so that the lowered code can be perfectly matched. For example, if we want to tensorize an addition intrinsic:We will replace
k
in intrinsic declaration withi*16+j
, then to compare the ir whether is identical. However, if we require the value of index, things will change. The verification will fail unless we express absolute indexing and relative indexing clearly.Design
Instead of proposing a way to express absolute indexing, we would like to come up with generalizing
tvm.compute
to the region of tensor. That means we can view a tensor intrinsic as an operation between regions, just like+
as an operation between elements.A[i, 0:16]
extern
operation, we hope that we can keep the ability to schedule the part out of tensorized region. It means there are two kinds of axes: 1. axes in tensorized region, which cannot be scheduled; 2. axes out of the tensorized region, which can be scheduled throughsplit
,reorder
, etc.With this generalization, we can use tensor intrinsic in declaration directly, so verification is no longer needed. Also, this way provides better encapsulation, user do not need to know the detail of tensor intrinsic.
Interface
Demo:
link to the PR: #1476
@tqchen @xqdan
The text was updated successfully, but these errors were encountered: