[RFC][TVM] Extend TensorComputeOp to allow scalar inputs #2606

jdavies-huawei · 2019-02-15T22:52:08Z

Motivation

TensorComputeOp allows a TensorIntrin to be called by the user directly, rather than relying on schedule.tensorize to match a pattern and perform a replacement. The original motivation for TensorComputeOpt was to generalize TVM's compute to tensor regions, as described in issue #1485.

Currently, all arguments passed to the tensor intrinsic must be tensor regions. We believe this is too restrictive. For example, it is common across multiple architectures for hardware intrinsics to take scalar arguments.

In a regular compute, it is already possible to use arbitrary expressions over the iterator variables, e.g.:

n = 10
A = tvm.placeholder((n, n))
B = tvm.compute((n, n), lambda i, j : A[i, j] + i*i)

However, it isn't possible to do something similar with TensorComputeOp, for example:

tfunc = intrin_tfunc(n)
C1 = tvm.compute((n, n), lambda i : tfunc(A[i, 0:n], i*i))

In the above, passing i*i to the TensorIntrin tfunc fails, because i*i is a scalar expression, not a tensor region.

One current workaround is to store i*i in another tensor:

S = tvm.compute((n, ), lambda i : i*i)
C2 = tvm.compute((n, n), lambda i : tfunc(A[i, 0:n], S[i])

However, this workaround introduces extra tensors that do not need to exist and will add overhead. Therefore, we propose to extend TensorComputeOp so scalar expressions can be passed to the TensorIntrin call.

Proposed Syntax

A list of scalar expressions is passed to the TensorIntrin call as a keyword argument 'scalar_inputs':

C = tvm.compute((n, n), lambda i: tfunc(A[i, 0:n], scalar_inputs=(i*i)))

When declaring the TensorIntrin, the expected scalar parameters must be listed in a keyword argument "scalar_params":

tfunc = tvm.decl_tensor_intrin(D.op, intrin_func, binds={a: Ab, c: Cb}, scalar_params=[s])

where the scalar parameters must be a variables used in D's compute:

s = tvm.var("s")
a = tvm.placeholder((n, ))
D = tvm.compute((n,), lambda i: a[i] + s)

Finally, the intrin_func lambda function passed to decl_tensor_intrin must take a third argument containing the list of scalar inputs ('sp' below). The scalar inputs can then be used in the emitted call:

# sp will be the list of scalar inputs passed to the TensorIntrin call
def intrin_func(ins, outs, sp):
  aa = ins[0]
  cc = outs[0]
  def _body():
    ib = tvm.ir_builder.create()
    ib.emit(tvm.call_extern("int32", "test_intrin",
                          cc.access_ptr("w"),
                          aa.access_ptr("r"),
                          sp[0]))
    return ib.get()
  return _body()

Example

import tvm

def intrin_test(n):
  s = tvm.var("s")
  a = tvm.placeholder((n,), name='a')
  d = tvm.compute((n,), lambda i: a[i] + s, name='d')

  def intrin_func(ins, outs, sp):
    aa = ins[0]
    cc = outs[0]
    def _body():
      ib = tvm.ir_builder.create()
      ib.emit(tvm.call_extern("int32", "test_intrin",
                            cc.access_ptr("w"),
                            aa.access_ptr("r"),
                            sp[0]))
      return ib.get()
    return _body()

  with tvm.build_config(offset_factor=1):
    return tvm.decl_tensor_intrin(d.op, intrin_func, scalar_params=[s])

if __name__ == '__main__':

    n = 10
    A = tvm.placeholder((n, n), name='A')
    tfunc = intrin_test(n)
    C = tvm.compute((n, n), lambda i: tfunc(A[i, 0:n], scalar_inputs=(i*i)), name='C')
    s = tvm.create_schedule(C.op)
    print(tvm.lower(s, [A, C], simple_mode=True))

The above example program produces the following output:

produce C {
  for (i, 0, 10) {
    test_intrin(tvm_address_of(C[(i*10)]), tvm_address_of(A[(i*10)]), (i*i))
  }
}

Implementation

The extension is implemented in the accompanying pull request.

Testing

The pull request includes one new unit test for this feature, but additional stronger tests are still required. We appreciate advice about how best to test this feature.

The text was updated successfully, but these errors were encountered:

ZihengJiang · 2019-02-17T18:23:55Z

Hi @jdavies-huawei , I like the idea to extend TensorComputeOp with scalar inputs, but for the API. It would be better we can use tfunc(A[i, 0:n], i*i) directly instead of having another parameter called scalar_inputs

derisavi · 2019-02-17T18:45:32Z

@ZihengJiang What if we have multiple tensor inputs and multiple scalar inputs? Do you suggest that we should be able to interleave them and write something like tfunc(A[i, 0:n], i*i, i+1, B[i, 0:n])?

jdavies-huawei · 2019-06-06T20:46:53Z

@ZihengJiang I made another pull request #3300 that makes the change you suggest. The scalar inputs can now be passed without using a named parameter.

@derisavi yes, the scalar inputs can now be interleaved with the tensor inputs. It is just important that the scalar inputs are ordered correctly (PostDFS order) with respect to themselves. And similarly, the tensor inputs must be ordered correctly (PostDFS order) with respect to eachother.

…3300)

tqchen · 2019-07-24T18:28:03Z

#3391

jdavies-huawei added a commit to jdavies-huawei/tvm that referenced this issue Feb 15, 2019

Extend TensorComputeOp to allow scalar inputs (apache#2606).

781df70

jdavies-huawei changed the title ~~Extend TensorComputeOp to allow scalar inputs~~ [RFC][TVM] Extend TensorComputeOp to allow scalar inputs Feb 19, 2019

jdavies-huawei added a commit to jdavies-huawei/tvm that referenced this issue Apr 17, 2019

Extend TensorComputeOp to allow scalar inputs (apache#2606).

e92feba

jdavies-huawei added a commit to jdavies-huawei/tvm that referenced this issue May 14, 2019

Extend TensorComputeOp to allow scalar inputs (apache#2606).

4226423

jdavies-huawei added a commit to jdavies-huawei/tvm that referenced this issue May 14, 2019

Extend TensorComputeOp to allow scalar inputs (apache#2606).

5d8e153

jdavies-huawei added a commit to jdavies-huawei/tvm that referenced this issue Jun 6, 2019

Extend TensorComputeOp to allow scalar inputs (apache#2606).

43f0a24

jdavies-huawei added a commit to jdavies-huawei/tvm that referenced this issue Jun 6, 2019

Extend TensorComputeOp to allow scalar inputs (apache#2606).

def8125

jdavies-huawei added a commit to jdavies-huawei/tvm that referenced this issue Jun 14, 2019

Extend TensorComputeOp to allow scalar inputs (apache#2606).

ffc574f

ZihengJiang pushed a commit that referenced this issue Jun 22, 2019

Extend TensorComputeOp to allow scalar inputs (#2606). (#3300)

e9634ea

wweic pushed a commit to wweic/tvm that referenced this issue Jun 26, 2019

Extend TensorComputeOp to allow scalar inputs (apache#2606). (apache#…

cd49a56

…3300)

wweic pushed a commit to neo-ai/tvm that referenced this issue Jun 27, 2019

Extend TensorComputeOp to allow scalar inputs (apache#2606). (apache#…

f61de7c

…3300)

tqchen closed this as completed Jul 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC][TVM] Extend TensorComputeOp to allow scalar inputs #2606

[RFC][TVM] Extend TensorComputeOp to allow scalar inputs #2606

jdavies-huawei commented Feb 15, 2019

ZihengJiang commented Feb 17, 2019

derisavi commented Feb 17, 2019

jdavies-huawei commented Jun 6, 2019

tqchen commented Jul 24, 2019

[RFC][TVM] Extend TensorComputeOp to allow scalar inputs #2606

[RFC][TVM] Extend TensorComputeOp to allow scalar inputs #2606

Comments

jdavies-huawei commented Feb 15, 2019

Motivation

Proposed Syntax

Example

Implementation

Testing

ZihengJiang commented Feb 17, 2019

derisavi commented Feb 17, 2019

jdavies-huawei commented Jun 6, 2019

tqchen commented Jul 24, 2019