-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CodeGen][CUDA] Fix bugs #5209
[CodeGen][CUDA] Fix bugs #5209
Conversation
- Support vectorized casts - It is incorrect to extract elements from int8x4 with 0x000000ff & (x >> i * 8) as this value is of type int in C/C++. If this expression is used for sign extensions, the sign bit will be wrong. Simply use C style casts instead and sign bits will just work. Signed-off-by: Wei Pan <weip@nvidia.com>
@vinx13 would be great if you can help to take a look |
return CodeGenC::VisitExpr_(op, os); | ||
|
||
// We could emit make_float4 like calls, but the emitted code looks | ||
// too compact to read. Emit this as vectorized unary ops. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any difference in performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, typical optimizations like mem2reg will promote temporary into registers and make them equivalent.
- Support vectorized casts - It is incorrect to extract elements from int8x4 with 0x000000ff & (x >> i * 8) as this value is of type int in C/C++. If this expression is used for sign extensions, the sign bit will be wrong. Simply use C style casts instead and sign bits will just work. Signed-off-by: Wei Pan <weip@nvidia.com>
- Support vectorized casts - It is incorrect to extract elements from int8x4 with 0x000000ff & (x >> i * 8) as this value is of type int in C/C++. If this expression is used for sign extensions, the sign bit will be wrong. Simply use C style casts instead and sign bits will just work. Signed-off-by: Wei Pan <weip@nvidia.com>
Support vectorized casts
It is incorrect to extract elements from int8x4 with
0x000000ff & (x >> i * 8)
as this value is of type int in C/C++. If this expression
is used for sign extensions, the sign bit will be wrong.
Simply use C style casts instead and sign bits will just work.
Signed-off-by: Wei Pan weip@nvidia.com
Thanks for contributing to TVM! Please refer to guideline https://tvm.apache.org/docs/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers by @ them in the pull request thread.