Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CodeGen][CUDA] Fix bugs #5209

Merged
merged 1 commit into from
Apr 3, 2020
Merged

[CodeGen][CUDA] Fix bugs #5209

merged 1 commit into from
Apr 3, 2020

Conversation

wpan11nv
Copy link
Contributor

@wpan11nv wpan11nv commented Apr 1, 2020

  • Support vectorized casts

  • It is incorrect to extract elements from int8x4 with

    0x000000ff & (x >> i * 8)

    as this value is of type int in C/C++. If this expression
    is used for sign extensions, the sign bit will be wrong.
    Simply use C style casts instead and sign bits will just work.

Signed-off-by: Wei Pan weip@nvidia.com

Thanks for contributing to TVM! Please refer to guideline https://tvm.apache.org/docs/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers by @ them in the pull request thread.

- Support vectorized casts

- It is incorrect to extract elements from int8x4 with

   0x000000ff & (x >> i * 8)

  as this value is of type int in C/C++. If this expression
  is used for sign extensions, the sign bit will be wrong.
  Simply use C style casts instead and sign bits will just work.

Signed-off-by: Wei Pan <weip@nvidia.com>
@tqchen
Copy link
Member

tqchen commented Apr 1, 2020

@vinx13 would be great if you can help to take a look

return CodeGenC::VisitExpr_(op, os);

// We could emit make_float4 like calls, but the emitted code looks
// too compact to read. Emit this as vectorized unary ops.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any difference in performance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, typical optimizations like mem2reg will promote temporary into registers and make them equivalent.

@vinx13 vinx13 merged commit 316ce05 into apache:master Apr 3, 2020
@wpan11nv wpan11nv deleted the fix_casts branch April 10, 2020 17:43
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Apr 16, 2020
- Support vectorized casts

- It is incorrect to extract elements from int8x4 with

   0x000000ff & (x >> i * 8)

  as this value is of type int in C/C++. If this expression
  is used for sign extensions, the sign bit will be wrong.
  Simply use C style casts instead and sign bits will just work.

Signed-off-by: Wei Pan <weip@nvidia.com>
zhiics pushed a commit to neo-ai/tvm that referenced this pull request Apr 17, 2020
- Support vectorized casts

- It is incorrect to extract elements from int8x4 with

   0x000000ff & (x >> i * 8)

  as this value is of type int in C/C++. If this expression
  is used for sign extensions, the sign bit will be wrong.
  Simply use C style casts instead and sign bits will just work.

Signed-off-by: Wei Pan <weip@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants