-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC][TVM] Extend TensorComputeOp to allow scalar inputs #2606
Comments
Hi @jdavies-huawei , I like the idea to extend TensorComputeOp with scalar inputs, but for the API. It would be better we can use |
@ZihengJiang What if we have multiple tensor inputs and multiple scalar inputs? Do you suggest that we should be able to interleave them and write something like |
@ZihengJiang I made another pull request #3300 that makes the change you suggest. The scalar inputs can now be passed without using a named parameter. @derisavi yes, the scalar inputs can now be interleaved with the tensor inputs. It is just important that the scalar inputs are ordered correctly (PostDFS order) with respect to themselves. And similarly, the tensor inputs must be ordered correctly (PostDFS order) with respect to eachother. |
Motivation
TensorComputeOp allows a TensorIntrin to be called by the user directly, rather than relying on schedule.tensorize to match a pattern and perform a replacement. The original motivation for TensorComputeOpt was to generalize TVM's compute to tensor regions, as described in issue #1485.
Currently, all arguments passed to the tensor intrinsic must be tensor regions. We believe this is too restrictive. For example, it is common across multiple architectures for hardware intrinsics to take scalar arguments.
In a regular compute, it is already possible to use arbitrary expressions over the iterator variables, e.g.:
However, it isn't possible to do something similar with TensorComputeOp, for example:
In the above, passing i*i to the TensorIntrin tfunc fails, because i*i is a scalar expression, not a tensor region.
One current workaround is to store i*i in another tensor:
However, this workaround introduces extra tensors that do not need to exist and will add overhead. Therefore, we propose to extend TensorComputeOp so scalar expressions can be passed to the TensorIntrin call.
Proposed Syntax
A list of scalar expressions is passed to the TensorIntrin call as a keyword argument 'scalar_inputs':
When declaring the TensorIntrin, the expected scalar parameters must be listed in a keyword argument "scalar_params":
where the scalar parameters must be a variables used in D's compute:
Finally, the intrin_func lambda function passed to decl_tensor_intrin must take a third argument containing the list of scalar inputs ('sp' below). The scalar inputs can then be used in the emitted call:
Example
The above example program produces the following output:
Implementation
The extension is implemented in the accompanying pull request.
Testing
The pull request includes one new unit test for this feature, but additional stronger tests are still required. We appreciate advice about how best to test this feature.
The text was updated successfully, but these errors were encountered: