Fix CUDA int8x4 vectorize #3928

llehtahw · 2019-09-10T13:28:19Z

fix CodeGenCUDA::PrintVecElemStore and CodeGenCUDA::PrintVecElemLoad
- avoid wrong generated cuda code like this:
```
int _x; // int as int8x4, from #1569
_x.x = x; // or x = _x.x
```
- get the deleted test(Use int for int8x4 due to performance overhead of char4 #1569) back
  - the test passed but cannot benefit from vectorize
fix accumulation of shared/local memory usage with vector types
- test updated

tqchen · 2019-09-12T19:28:50Z

@vinx13 please manage this PR

vinx13 · 2019-09-13T01:04:57Z

Thanks @llehtahw this is merged

* Fix int8x4 vectorize * Fix gpu shared/local memory accumulate * Add test_shared_memory for int8x4 * Adjust test format * Fix cpplint

llehtahw added 5 commits September 10, 2019 19:15

Fix int8x4 vectorize

7dea310

Fix gpu shared/local memory accumulate

6e6ca45

Add test_shared_memory for int8x4

c42c165

Adjust test format

f440a45

Fix cpplint

f90921d

tqchen assigned vinx13 Sep 12, 2019

tqchen added the status: need review label Sep 12, 2019

vinx13 approved these changes Sep 13, 2019

View reviewed changes

vinx13 merged commit 195973c into apache:master Sep 13, 2019

vinx13 added status: accepted and removed status: need review labels Sep 13, 2019

llehtahw deleted the fix-int8x4-vectorize branch September 13, 2019 01:15

wweic pushed a commit to wweic/tvm that referenced this pull request Sep 16, 2019

Fix CUDA int8x4 vectorize (apache#3928)

a509d76

* Fix int8x4 vectorize * Fix gpu shared/local memory accumulate * Add test_shared_memory for int8x4 * Adjust test format * Fix cpplint

wweic pushed a commit to wweic/tvm that referenced this pull request Sep 16, 2019

Fix CUDA int8x4 vectorize (apache#3928)

b899e3b

* Fix int8x4 vectorize * Fix gpu shared/local memory accumulate * Add test_shared_memory for int8x4 * Adjust test format * Fix cpplint

wweic pushed a commit to neo-ai/tvm that referenced this pull request Sep 16, 2019

Fix CUDA int8x4 vectorize (apache#3928)

983ac9c

* Fix int8x4 vectorize * Fix gpu shared/local memory accumulate * Add test_shared_memory for int8x4 * Adjust test format * Fix cpplint

yzhliu mentioned this pull request Nov 11, 2019

[RELEASE][DRAFT] TVM v0.6 Release candidate #4259

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CUDA int8x4 vectorize #3928

Fix CUDA int8x4 vectorize #3928

llehtahw commented Sep 10, 2019

tqchen commented Sep 12, 2019

vinx13 commented Sep 13, 2019

Fix CUDA int8x4 vectorize #3928

Fix CUDA int8x4 vectorize #3928

Conversation

llehtahw commented Sep 10, 2019

tqchen commented Sep 12, 2019

vinx13 commented Sep 13, 2019