Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC][TVM] Extend TensorComputeOp to allow scalar inputs #2606

Closed
jdavies-huawei opened this issue Feb 15, 2019 · 4 comments
Closed

[RFC][TVM] Extend TensorComputeOp to allow scalar inputs #2606

jdavies-huawei opened this issue Feb 15, 2019 · 4 comments

Comments

@jdavies-huawei
Copy link
Contributor

Motivation

TensorComputeOp allows a TensorIntrin to be called by the user directly, rather than relying on schedule.tensorize to match a pattern and perform a replacement. The original motivation for TensorComputeOpt was to generalize TVM's compute to tensor regions, as described in issue #1485.

Currently, all arguments passed to the tensor intrinsic must be tensor regions. We believe this is too restrictive. For example, it is common across multiple architectures for hardware intrinsics to take scalar arguments.

In a regular compute, it is already possible to use arbitrary expressions over the iterator variables, e.g.:

n = 10
A = tvm.placeholder((n, n))
B = tvm.compute((n, n), lambda i, j : A[i, j] + i*i)

However, it isn't possible to do something similar with TensorComputeOp, for example:

tfunc = intrin_tfunc(n)
C1 = tvm.compute((n, n), lambda i : tfunc(A[i, 0:n], i*i))

In the above, passing i*i to the TensorIntrin tfunc fails, because i*i is a scalar expression, not a tensor region.

One current workaround is to store i*i in another tensor:

S = tvm.compute((n, ), lambda i : i*i)
C2 = tvm.compute((n, n), lambda i : tfunc(A[i, 0:n], S[i])

However, this workaround introduces extra tensors that do not need to exist and will add overhead. Therefore, we propose to extend TensorComputeOp so scalar expressions can be passed to the TensorIntrin call.

Proposed Syntax

A list of scalar expressions is passed to the TensorIntrin call as a keyword argument 'scalar_inputs':

C = tvm.compute((n, n), lambda i: tfunc(A[i, 0:n], scalar_inputs=(i*i)))

When declaring the TensorIntrin, the expected scalar parameters must be listed in a keyword argument "scalar_params":

tfunc = tvm.decl_tensor_intrin(D.op, intrin_func, binds={a: Ab, c: Cb}, scalar_params=[s])

where the scalar parameters must be a variables used in D's compute:

s = tvm.var("s")
a = tvm.placeholder((n, ))
D = tvm.compute((n,), lambda i: a[i] + s)

Finally, the intrin_func lambda function passed to decl_tensor_intrin must take a third argument containing the list of scalar inputs ('sp' below). The scalar inputs can then be used in the emitted call:

# sp will be the list of scalar inputs passed to the TensorIntrin call
def intrin_func(ins, outs, sp):
  aa = ins[0]
  cc = outs[0]
  def _body():
    ib = tvm.ir_builder.create()
    ib.emit(tvm.call_extern("int32", "test_intrin",
                          cc.access_ptr("w"),
                          aa.access_ptr("r"),
                          sp[0]))
    return ib.get()
  return _body()

Example

import tvm

def intrin_test(n):
  s = tvm.var("s")
  a = tvm.placeholder((n,), name='a')
  d = tvm.compute((n,), lambda i: a[i] + s, name='d')

  def intrin_func(ins, outs, sp):
    aa = ins[0]
    cc = outs[0]
    def _body():
      ib = tvm.ir_builder.create()
      ib.emit(tvm.call_extern("int32", "test_intrin",
                            cc.access_ptr("w"),
                            aa.access_ptr("r"),
                            sp[0]))
      return ib.get()
    return _body()

  with tvm.build_config(offset_factor=1):
    return tvm.decl_tensor_intrin(d.op, intrin_func, scalar_params=[s])

if __name__ == '__main__':

    n = 10
    A = tvm.placeholder((n, n), name='A')
    tfunc = intrin_test(n)
    C = tvm.compute((n, n), lambda i: tfunc(A[i, 0:n], scalar_inputs=(i*i)), name='C')
    s = tvm.create_schedule(C.op)
    print(tvm.lower(s, [A, C], simple_mode=True))

The above example program produces the following output:

produce C {
  for (i, 0, 10) {
    test_intrin(tvm_address_of(C[(i*10)]), tvm_address_of(A[(i*10)]), (i*i))
  }
}

Implementation

The extension is implemented in the accompanying pull request.

Testing

The pull request includes one new unit test for this feature, but additional stronger tests are still required. We appreciate advice about how best to test this feature.

jdavies-huawei added a commit to jdavies-huawei/tvm that referenced this issue Feb 15, 2019
@ZihengJiang
Copy link
Contributor

Hi @jdavies-huawei , I like the idea to extend TensorComputeOp with scalar inputs, but for the API. It would be better we can use tfunc(A[i, 0:n], i*i) directly instead of having another parameter called scalar_inputs

@derisavi
Copy link
Contributor

@ZihengJiang What if we have multiple tensor inputs and multiple scalar inputs? Do you suggest that we should be able to interleave them and write something like tfunc(A[i, 0:n], i*i, i+1, B[i, 0:n])?

@jdavies-huawei jdavies-huawei changed the title Extend TensorComputeOp to allow scalar inputs [RFC][TVM] Extend TensorComputeOp to allow scalar inputs Feb 19, 2019
jdavies-huawei added a commit to jdavies-huawei/tvm that referenced this issue Apr 17, 2019
jdavies-huawei added a commit to jdavies-huawei/tvm that referenced this issue May 14, 2019
jdavies-huawei added a commit to jdavies-huawei/tvm that referenced this issue May 14, 2019
jdavies-huawei added a commit to jdavies-huawei/tvm that referenced this issue Jun 6, 2019
@jdavies-huawei
Copy link
Contributor Author

@ZihengJiang I made another pull request #3300 that makes the change you suggest. The scalar inputs can now be passed without using a named parameter.

@derisavi yes, the scalar inputs can now be interleaved with the tensor inputs. It is just important that the scalar inputs are ordered correctly (PostDFS order) with respect to themselves. And similarly, the tensor inputs must be ordered correctly (PostDFS order) with respect to eachother.

@tqchen
Copy link
Member

tqchen commented Jul 24, 2019

#3391

@tqchen tqchen closed this as completed Jul 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants