[SYCL][NVPTX] Set default fdiv and sqrt for llvm.fpbuiltin #16714

MrSidims · 2025-01-21T14:23:32Z

AltMathLibrary is lacking implementation for llvm.fpbuiltin intrinsics
for NVPTX target. This patch adds type-dependent mapping for
llvm.fpbuiltin.fdiv with max-error > 2.0 and llvm.fpbuiltin.sqrt with
max-error > 1.0 on nvvm intrinsics:
fp32 scalar @llvm.fpbuiltin.fdiv -> @llvm.nvvm.div.approx.f
fp32 scalar @llvm.fpbuiltin.sqrt -> @llvm.nvvm.sqrt.approx.f

vector or non-fp32 scalar llvm.fpbuiltin.fdiv -> fdiv
vector or non-fp32 scalar llvm.fpbuiltin.sqrt -> llvm.sqrt

Additionally it maps max-error=0.5 fpbuiltin.fadd, fpbuiltin.fsub.
fpbuiltin.fmul, fpbuiltin.fdiv, fpbuiltin.frem, fpbuiltin.sqrt and
fpbuiltin.ldexp intrinsic functions of LLVM's math operations or
https://llvm.org/docs/LangRef.html#standard-c-c-library-intrinsics

TODO in future patches:

add preservation of debug info in FPBuiltinFnSelection;
moved tests from CodeGen to Transform
move pass to new pass manager

Signed-off-by: Sidorov, Dmitry dmitry.sidorov@intel.com

Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>

…x-error We are lacking implementation for llvm.fpbuiltin intrinsics for NVPTX target. This patch adds type-and fast-math- dependent mapping for llvm.fpbuiltin.fdiv and llvm.fpbuiltin.sqrt with 3.0 max-error on nvvm intrinsics: fp32 scalar @llvm.fpbuiltin.fdiv -> @llvm.nvvm.div.approx.f fp32 scalar @llvm.fpbuiltin.fdiv fast -> @llvm.nvvm.div.approx.ftz.f fp32 scalar @llvm.fpbuiltin.sqrt -> @llvm.nvvm.sqrt.approx.f fp32 scalar @llvm.fpbuiltin.sqrt fast -> @llvm.nvvm.sqrt.approx.ftz.f vector or non-fp32 scalar llvm.fpbuiltin.fdiv -> fdiv vector or non-fp32 scalar llvm.fpbuiltin.sqrt -> llvm.sqrt Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>

MrSidims · 2025-01-21T14:26:00Z

@zahiraam FYI this PR should fix c-math LIT failure on CUDA in #15836
@npmiller @Naghasan please take a look at the PR or assign somebody familiar with CUDA/NVPTX as I'm quite fresh in this topic.

MrSidims · 2025-01-21T14:26:44Z

llvm/test/CodeGen/NVPTX/fp-builtin-intrinsics-nvvm-max-error-3.0.ll

@@ -0,0 +1,81 @@
+; RUN: opt -fpbuiltin-fn-selection -S < %s | FileCheck %s


I put this test in CodeGen as Andy was adding his tests for fpbuiltin there (including those, which were testing just transformations without invoking llc)

I would prefer those tests to be in Transform instead, but I'm not against adding a single new test into CodeGen simply because they should either all be in the right location, or none of them (i.e. the move should be done as a separate PR).

Totally agree, lets keep the tests in one place for now. If necessary we will move some of them to Transform dir

AlexeySachkov · 2025-01-21T18:44:44Z

llvm/lib/Transforms/Scalar/FPBuiltinFnSelection.cpp

+  // Lets map them on NVPTX intrinsics. If no appropriate intrinsics are known
+  // - skip to replaceWithAltMathFunction.
+  if (T.isNVPTX() && BuiltinCall.getRequiredAccuracy().value() == 3.0) {
+    if (replaceWithNVPTXCalls(BuiltinCall))


Should we somehow encode "skip"/"fallback" to alt math functions part into the name?

Not sure, from my perspective the code here is not complex and speaks for itself :)

My point was not in the complexity, but in the expectations.

Let's say you read the this specific function, i.e. you start top-down to understand high-level structure first. Just by looking at the name, you may expect that this specific step will introduce some NVPTX-specific intrinsics, but in fact, it will do some other form of lowering which could be surprising for you unless you also went through the function itself.

Sure, renamed and clarified in the comment

AlexeySachkov · 2025-01-21T18:46:44Z

llvm/lib/Transforms/Scalar/FPBuiltinFnSelection.cpp

@@ -106,6 +107,48 @@ static bool replaceWithLLVMIR(FPBuiltinIntrinsic &BuiltinCall) {
  return true;
 }

+static bool replaceWithNVPTXCalls(FPBuiltinIntrinsic &BuiltinCall) {


We only call this function if requested precision is exactly 3.0, but the name is quite generic. I think that it worth adding a comment right before the function to better specify its intent/expected use case of this function in case it will be extended in the future

Renamed the function and added the comment

AlexeySachkov

The code itself looks good to me, just a few naming/documentation comments.
I'm also unfamiliar with CUDA, so it would be great to hear feedback from @intel/llvm-reviewers-cuda

zahiraam · 2025-01-21T20:48:31Z

@zahiraam FYI this PR should fix c-math LIT failure on CUDA in #15836 @npmiller @Naghasan please take a look at the PR or assign somebody familiar with CUDA/NVPTX as I'm quite fresh in this topic.

This change makes sense to me. I will wait until this PR is merged in, then will rebase and test again and hopefully the cmath_test.cpp fail will get fixed.

llvm/lib/Transforms/Scalar/FPBuiltinFnSelection.cpp

Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>

frasercrmck

The PR description will need updating to reflect the FTZ changes

llvm/lib/Transforms/Scalar/FPBuiltinFnSelection.cpp

Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>

MrSidims · 2025-01-29T13:20:46Z

@jchlanda @frasercrmck @maksimsab hi, following conversation with @jcranmer-intel I've significantly changed the patch since your approval, may I ask you to take another look?

What have changed:
Added lowering for 0.5 aka precise fpbuiltin intrinsics as AltMathLibrary don't handle them for NVPTX/CUDA. Lowering for approx nvvm is now dependent on max-error value and follows https://docs.nvidia.com/cuda/parallel-thread-execution/#floating-point-instructions , so for example for fdiv it starts from 2.0 ULP. Note, that range between 0.5 and 2.0 ULP for fdiv is not covered by this patch.

llvm/lib/Transforms/Scalar/FPBuiltinFnSelection.cpp

Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>

frasercrmck

Generally looks good, just some questions.

llvm/test/CodeGen/NVPTX/fp-builtin-intrinsics-nvvm-approx.ll

llvm/lib/Transforms/Scalar/FPBuiltinFnSelection.cpp

llvm/test/CodeGen/NVPTX/fp-builtin-intrinsics-nvvm-max-error-0.5.ll

Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>

jcranmer-intel

One last nit:
Can you add a test that uses a rather large fpbuiltin-max-error, say 10 or 100? Just to cover the non-equality cases for the various if statements.

Other than that, this looks good.

Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>

MrSidims · 2025-01-30T18:08:43Z

@intel/llvm-gatekeepers please help with merge

This PR also adds a test that debug info is being reserved during transformation. Follow up for #16714 --------- Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>

MrSidims added 2 commits January 21, 2025 03:41

try

ca71b1f

Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>

MrSidims requested a review from a team as a code owner January 21, 2025 14:23

MrSidims temporarily deployed to WindowsCILock January 21, 2025 14:23 — with GitHub Actions Inactive

MrSidims requested review from zahiraam, npmiller and Naghasan January 21, 2025 14:23

MrSidims commented Jan 21, 2025

View reviewed changes

MrSidims temporarily deployed to WindowsCILock January 21, 2025 15:13 — with GitHub Actions Inactive

bader requested a review from jcranmer-intel January 21, 2025 17:42

AlexeySachkov reviewed Jan 21, 2025

View reviewed changes

frasercrmck reviewed Jan 22, 2025

View reviewed changes

llvm/lib/Transforms/Scalar/FPBuiltinFnSelection.cpp Outdated Show resolved Hide resolved

apply comments

ee64287

Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>

MrSidims had a problem deploying to WindowsCILock January 22, 2025 12:26 — with GitHub Actions Error

format

b16492f

Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>

MrSidims temporarily deployed to WindowsCILock January 22, 2025 12:36 — with GitHub Actions Inactive

MrSidims temporarily deployed to WindowsCILock January 22, 2025 13:26 — with GitHub Actions Inactive

rename

01b3032

Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>

MrSidims requested review from AlexeySachkov and frasercrmck January 23, 2025 11:26

frasercrmck reviewed Jan 23, 2025

View reviewed changes

llvm/lib/Transforms/Scalar/FPBuiltinFnSelection.cpp Outdated Show resolved Hide resolved

typo

5ae0e94

Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>

MrSidims requested a review from frasercrmck January 23, 2025 12:18

MrSidims had a problem deploying to WindowsCILock January 23, 2025 12:19 — with GitHub Actions Error

typo

6720ed0

Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>

MrSidims temporarily deployed to WindowsCILock January 23, 2025 12:22 — with GitHub Actions Inactive

jchlanda approved these changes Jan 29, 2025

View reviewed changes

llvm/lib/Transforms/Scalar/FPBuiltinFnSelection.cpp Outdated Show resolved Hide resolved

MrSidims had a problem deploying to WindowsCILock January 29, 2025 14:00 — with GitHub Actions Error

format

700ae48

Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>

MrSidims temporarily deployed to WindowsCILock January 29, 2025 14:08 — with GitHub Actions Inactive

maksimsab approved these changes Jan 29, 2025

View reviewed changes

MrSidims temporarily deployed to WindowsCILock January 29, 2025 14:55 — with GitHub Actions Inactive

frasercrmck reviewed Jan 29, 2025

View reviewed changes

llvm/test/CodeGen/NVPTX/fp-builtin-intrinsics-nvvm-approx.ll Show resolved Hide resolved

llvm/lib/Transforms/Scalar/FPBuiltinFnSelection.cpp Show resolved Hide resolved

frasercrmck approved these changes Jan 29, 2025

View reviewed changes

bader reviewed Jan 29, 2025

View reviewed changes

llvm/test/CodeGen/NVPTX/fp-builtin-intrinsics-nvvm-max-error-0.5.ll Outdated Show resolved Hide resolved

llvm/test/CodeGen/NVPTX/fp-builtin-intrinsics-nvvm-max-error-0.5.ll Outdated Show resolved Hide resolved

apply suggestions

2df0a72

Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>

MrSidims had a problem deploying to WindowsCILock January 29, 2025 18:32 — with GitHub Actions Failure

MrSidims had a problem deploying to WindowsCILock January 29, 2025 19:20 — with GitHub Actions Failure

MrSidims temporarily deployed to WindowsCILock January 29, 2025 19:30 — with GitHub Actions Inactive

MrSidims temporarily deployed to WindowsCILock January 29, 2025 20:17 — with GitHub Actions Inactive

jcranmer-intel reviewed Jan 29, 2025

View reviewed changes

add test

e86be33

Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>

MrSidims had a problem deploying to WindowsCILock January 29, 2025 21:44 — with GitHub Actions Failure

jcranmer-intel approved these changes Jan 29, 2025

View reviewed changes

MrSidims temporarily deployed to WindowsCILock January 29, 2025 22:31 — with GitHub Actions Inactive

fix typo

6113cc3

Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>

MrSidims temporarily deployed to WindowsCILock January 30, 2025 00:25 — with GitHub Actions Inactive

MrSidims temporarily deployed to WindowsCILock January 30, 2025 01:11 — with GitHub Actions Inactive

MrSidims requested a review from a team January 30, 2025 18:08

aelovikov-intel merged commit 52238e1 into intel:sycl Jan 30, 2025
16 checks passed

MrSidims mentioned this pull request Jan 30, 2025

[NFC][SYCL] Move llvm.fpbuiltin tests to Transforms #16847

Merged

This was referenced Feb 11, 2025

[NFCI][SYCL] Refactor selection of FP builtin calls #16966

Merged

[SYCL] Add support for -foffload-fp32-prec-div/sqrt options. #15836

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL][NVPTX] Set default fdiv and sqrt for llvm.fpbuiltin #16714

[SYCL][NVPTX] Set default fdiv and sqrt for llvm.fpbuiltin #16714

MrSidims commented Jan 21, 2025 •

edited

Loading

MrSidims commented Jan 21, 2025

MrSidims Jan 21, 2025

AlexeySachkov Jan 21, 2025

MrSidims Jan 22, 2025

AlexeySachkov Jan 21, 2025

MrSidims Jan 22, 2025

AlexeySachkov Jan 22, 2025

MrSidims Jan 23, 2025

AlexeySachkov Jan 21, 2025

MrSidims Jan 22, 2025

AlexeySachkov left a comment

zahiraam commented Jan 21, 2025

frasercrmck left a comment

MrSidims commented Jan 29, 2025

frasercrmck left a comment

jcranmer-intel left a comment

MrSidims commented Jan 30, 2025

		@@ -0,0 +1,81 @@
		; RUN: opt -fpbuiltin-fn-selection -S < %s \| FileCheck %s

[SYCL][NVPTX] Set default fdiv and sqrt for llvm.fpbuiltin #16714

[SYCL][NVPTX] Set default fdiv and sqrt for llvm.fpbuiltin #16714

Conversation

MrSidims commented Jan 21, 2025 • edited Loading

MrSidims commented Jan 21, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlexeySachkov left a comment

Choose a reason for hiding this comment

zahiraam commented Jan 21, 2025

frasercrmck left a comment

Choose a reason for hiding this comment

MrSidims commented Jan 29, 2025

frasercrmck left a comment

Choose a reason for hiding this comment

jcranmer-intel left a comment

Choose a reason for hiding this comment

MrSidims commented Jan 30, 2025

MrSidims commented Jan 21, 2025 •

edited

Loading