[CodeGen][CUDA] Fix bugs #5209

wpan11nv · 2020-04-01T21:02:00Z

Support vectorized casts
It is incorrect to extract elements from int8x4 with

0x000000ff & (x >> i * 8)

as this value is of type int in C/C++. If this expression
is used for sign extensions, the sign bit will be wrong.
Simply use C style casts instead and sign bits will just work.

Signed-off-by: Wei Pan weip@nvidia.com

Thanks for contributing to TVM! Please refer to guideline https://tvm.apache.org/docs/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers by @ them in the pull request thread.

- Support vectorized casts - It is incorrect to extract elements from int8x4 with 0x000000ff & (x >> i * 8) as this value is of type int in C/C++. If this expression is used for sign extensions, the sign bit will be wrong. Simply use C style casts instead and sign bits will just work. Signed-off-by: Wei Pan <weip@nvidia.com>

tqchen · 2020-04-01T21:03:03Z

@vinx13 would be great if you can help to take a look

vinx13 · 2020-04-02T04:49:40Z

src/target/source/codegen_cuda.cc

+    return CodeGenC::VisitExpr_(op, os);
+
+  // We could emit make_float4 like calls, but the emitted code looks
+  // too compact to read. Emit this as vectorized unary ops.


is there any difference in performance?

No, typical optimizations like mem2reg will promote temporary into registers and make them equivalent.

- Support vectorized casts - It is incorrect to extract elements from int8x4 with 0x000000ff & (x >> i * 8) as this value is of type int in C/C++. If this expression is used for sign extensions, the sign bit will be wrong. Simply use C style casts instead and sign bits will just work. Signed-off-by: Wei Pan <weip@nvidia.com>

tqchen assigned vinx13 Apr 1, 2020

tqchen added the status: need review label Apr 1, 2020

vinx13 reviewed Apr 2, 2020

View reviewed changes

vinx13 approved these changes Apr 3, 2020

View reviewed changes

vinx13 merged commit 316ce05 into apache:master Apr 3, 2020

vinx13 added status: accepted and removed status: need review labels Apr 3, 2020

wpan11nv deleted the fix_casts branch April 10, 2020 17:43

ZihengJiang mentioned this pull request Sep 25, 2020

TVM v0.7 Release Note Candidate #6486

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CodeGen][CUDA] Fix bugs #5209

[CodeGen][CUDA] Fix bugs #5209

wpan11nv commented Apr 1, 2020

tqchen commented Apr 1, 2020

vinx13 Apr 2, 2020

wpan11nv Apr 2, 2020

[CodeGen][CUDA] Fix bugs #5209

[CodeGen][CUDA] Fix bugs #5209

Conversation

wpan11nv commented Apr 1, 2020

tqchen commented Apr 1, 2020

vinx13 Apr 2, 2020

Choose a reason for hiding this comment

wpan11nv Apr 2, 2020

Choose a reason for hiding this comment