Skip to content

[SYCL][CUDA][libclc] Add bf16 builtins and optimize half builtins for fma, fmin, fmax and fmax #5724

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Mar 14, 2022

Conversation

t4c1
Copy link
Contributor

@t4c1 t4c1 commented Mar 3, 2022

For functions fma, fmin, fmax and fmax adds bf16 builtins to libclc and optimizes half builtins to use half instructions if supported by the device.

This PR also contains some changes (everything in clang folder) that have been merged in upstream llvm since last pulldown and are required for building it. There are parts of (something went wrong when merging these, so only parts were merged at first. The changes in this PR are the remainder): https://reviews.llvm.org/D118977 https://reviews.llvm.org/D117887 https://reviews.llvm.org/D119157

Tests for half changes are in intel/llvm-test-suite#880. Tests for bf16 implementations will be added together with adding support for these to runtime in future PRs.

t4c1 and others added 17 commits February 18, 2022 08:29
Adds support for the following builtins:

abs, neg:
- .bf16,
- .bf16x2
min, max
- {.ftz}{.NaN}{.xorsign.abs}.f16
- {.ftz}{.NaN}{.xorsign.abs}.f16x2
- {.NaN}{.xorsign.abs}.bf16
- {.NaN}{.xorsign.abs}.bf16x2
- {.ftz}{.NaN}{.xorsign.abs}.f32

Differential Revision: https://reviews.llvm.org/D117887
This patch adds builtins/intrinsics for the following variants of FMA:

NOTE: follow-up commit with the missing clang-side changes.

- f16, f16x2
  - rn
  - rn_ftz
  - rn_sat
  - rn_ftz_sat
  - rn_relu
  - rn_ftz_relu
- bf16, bf16x2
  - rn
  - rn_relu

ptxas (Cuda compilation tools, release 11.0, V11.0.194) is happy with the generated assembly.

Differential Revision: https://reviews.llvm.org/D118977
NOTE: this is a follow-up commit with the missing clang-side changes.

This patch adds builtins and intrinsics for the f16 and f16x2 variants of the ex2
instruction.

These two variants were added in PTX7.0, and are supported by sm_75 and above.

Note that this isn't wired with the exp2 llvm intrinsic because the ex2
instruction is only available in its approx variant.

Running ptxas on the assembly generated by the test f16-ex2.ll works as
expected.

Differential Revision: https://reviews.llvm.org/D119157
@t4c1 t4c1 requested review from a team and bader as code owners March 3, 2022 15:17
@bader bader changed the title [SYCL][CUDA][libclc] add bf16 builtins and optimize half builtins for fma, fmin, fmax and fmax [SYCL][CUDA][libclc] Add bf16 builtins and optimize half builtins for fma, fmin, fmax and fmax Mar 3, 2022
t4c1 and others added 2 commits March 9, 2022 11:56
Apply review suggestions.

Co-authored-by: Alexey Bader <alexey.bader@intel.com>
bader
bader previously approved these changes Mar 9, 2022
Copy link
Contributor

@bader bader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libclc changes look good to me.

bader
bader previously approved these changes Mar 9, 2022
@t4c1
Copy link
Contributor Author

t4c1 commented Mar 9, 2022

I just removed the changes to native_exp2, as that is being implemented in a slightly different way in #5747.

Copy link
Contributor

@smanna12 smanna12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FE changes LGTM As per comment: #5724 (comment)

This change (everything in clang folder) are already merged upstream. I just added them to this PR as they are required to build it. They will be part of the next pulldown.

This PR also contains some changes (everything in clang folder) that have been merged in upstream llvm since last pulldown and are required for building it.

@t4c1, could you please add upstream link?

@t4c1
Copy link
Contributor Author

t4c1 commented Mar 14, 2022

Done - updated the PR description.

@bader bader merged commit 62651dd into intel:sycl Mar 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants