Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CodeGen][CUDA] Fix issues in cuda codegen #4876

Merged
merged 1 commit into from
Feb 16, 2020

Conversation

wpan11nv
Copy link
Contributor

  • Do not emit shared etc. as part of type for casting

  • Fix fp16 reduction kernels with compiler errors:

    "no operator "+" matches these operands, volatile half + volatile half

    This patch inserts casts to remove volatile type qualifier following
    volatile loads (fp16 only). CUDA fp16 library headers should add
    volatile member functions.

Signed-off-by: Wei Pan weip@nvidia.com

Thanks for contributing to TVM! Please refer to guideline https://docs.tvm.ai/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers by @ them in the pull request thread.

@wpan11nv
Copy link
Contributor Author

wpan11nv commented Feb 13, 2020

This patch should fix errors observed below (I did not verify as I found no complete reproducers there). My own test works fine with CUDA 10.2.

https://discuss.tvm.ai/t/error-fp16-cuda-compilation-error/4586

This issue has been also reported to NVIDIA CUDA team.

@tqchen
Copy link
Member

tqchen commented Feb 13, 2020

cc @vinx13 @ZihengJiang please help to take a look

@wpan11nv wpan11nv force-pushed the fp16_reduction_fixes branch 2 times, most recently from c31bccd to d3d8b0c Compare February 14, 2020 22:37
@vinx13
Copy link
Member

vinx13 commented Feb 15, 2020

@vinx13
Copy link
Member

vinx13 commented Feb 15, 2020

also cc @yzhliu @zxy844288792 @Hzfengsy

- Do not emit __shared__ etc. as part of type for casting

- Fix fp16 reduction kernels with compiler errors:

  "no operator "+" matches these operands, volatile half + volatile half

  This patch inserts casts to remove volatile type qualifier following
  volatile loads (fp16 only). CUDA fp16 library headers should add
  volatile member functions.

- Update have_fp16 to include compute 6.1 GPUs, which do support fp16,
  although their fp16 throughput is low. Updated tests.

Signed-off-by: Wei Pan <weip@nvidia.com>
@wpan11nv
Copy link
Contributor Author

Updated the patch as suggested. Thanks!

@vinx13 vinx13 merged commit d50ba72 into apache:master Feb 16, 2020
@vinx13
Copy link
Member

vinx13 commented Feb 16, 2020

Thanks @wpan11nv this is merged

@wpan11nv wpan11nv deleted the fp16_reduction_fixes branch February 17, 2020 05:38
alexwong pushed a commit to alexwong/tvm that referenced this pull request Feb 26, 2020
- Do not emit __shared__ etc. as part of type for casting

- Fix fp16 reduction kernels with compiler errors:

  "no operator "+" matches these operands, volatile half + volatile half

  This patch inserts casts to remove volatile type qualifier following
  volatile loads (fp16 only). CUDA fp16 library headers should add
  volatile member functions.

- Update have_fp16 to include compute 6.1 GPUs, which do support fp16,
  although their fp16 throughput is low. Updated tests.

Signed-off-by: Wei Pan <weip@nvidia.com>
alexwong pushed a commit to alexwong/tvm that referenced this pull request Feb 28, 2020
- Do not emit __shared__ etc. as part of type for casting

- Fix fp16 reduction kernels with compiler errors:

  "no operator "+" matches these operands, volatile half + volatile half

  This patch inserts casts to remove volatile type qualifier following
  volatile loads (fp16 only). CUDA fp16 library headers should add
  volatile member functions.

- Update have_fp16 to include compute 6.1 GPUs, which do support fp16,
  although their fp16 throughput is low. Updated tests.

Signed-off-by: Wei Pan <weip@nvidia.com>
zhiics pushed a commit to neo-ai/tvm that referenced this pull request Mar 2, 2020
- Do not emit __shared__ etc. as part of type for casting

- Fix fp16 reduction kernels with compiler errors:

  "no operator "+" matches these operands, volatile half + volatile half

  This patch inserts casts to remove volatile type qualifier following
  volatile loads (fp16 only). CUDA fp16 library headers should add
  volatile member functions.

- Update have_fp16 to include compute 6.1 GPUs, which do support fp16,
  although their fp16 throughput is low. Updated tests.

Signed-off-by: Wei Pan <weip@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants