Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][NVPTX] Set default fdiv and sqrt for llvm.fpbuiltin #16714

Merged
merged 16 commits into from
Jan 30, 2025

Conversation

MrSidims
Copy link
Contributor

@MrSidims MrSidims commented Jan 21, 2025

AltMathLibrary is lacking implementation for llvm.fpbuiltin intrinsics
for NVPTX target. This patch adds type-dependent mapping for
llvm.fpbuiltin.fdiv with max-error > 2.0 and llvm.fpbuiltin.sqrt with
max-error > 1.0 on nvvm intrinsics:
fp32 scalar @llvm.fpbuiltin.fdiv -> @llvm.nvvm.div.approx.f
fp32 scalar @llvm.fpbuiltin.sqrt -> @llvm.nvvm.sqrt.approx.f

vector or non-fp32 scalar llvm.fpbuiltin.fdiv -> fdiv
vector or non-fp32 scalar llvm.fpbuiltin.sqrt -> llvm.sqrt

Additionally it maps max-error=0.5 fpbuiltin.fadd, fpbuiltin.fsub.
fpbuiltin.fmul, fpbuiltin.fdiv, fpbuiltin.frem, fpbuiltin.sqrt and
fpbuiltin.ldexp intrinsic functions of LLVM's math operations or
https://llvm.org/docs/LangRef.html#standard-c-c-library-intrinsics

TODO in future patches:

  • add preservation of debug info in FPBuiltinFnSelection;
  • moved tests from CodeGen to Transform
  • move pass to new pass manager

Signed-off-by: Sidorov, Dmitry dmitry.sidorov@intel.com

Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
…x-error

We are lacking implementation for llvm.fpbuiltin intrinsics for NVPTX
target. This patch adds type-and fast-math- dependent mapping for
llvm.fpbuiltin.fdiv and llvm.fpbuiltin.sqrt with 3.0 max-error on nvvm
intrinsics:
fp32 scalar @llvm.fpbuiltin.fdiv -> @llvm.nvvm.div.approx.f
fp32 scalar @llvm.fpbuiltin.fdiv fast -> @llvm.nvvm.div.approx.ftz.f
fp32 scalar @llvm.fpbuiltin.sqrt -> @llvm.nvvm.sqrt.approx.f
fp32 scalar @llvm.fpbuiltin.sqrt fast -> @llvm.nvvm.sqrt.approx.ftz.f

vector or non-fp32 scalar llvm.fpbuiltin.fdiv -> fdiv
vector or non-fp32 scalar llvm.fpbuiltin.sqrt -> llvm.sqrt

Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
@MrSidims MrSidims requested a review from a team as a code owner January 21, 2025 14:23
@MrSidims
Copy link
Contributor Author

@zahiraam FYI this PR should fix c-math LIT failure on CUDA in #15836
@npmiller @Naghasan please take a look at the PR or assign somebody familiar with CUDA/NVPTX as I'm quite fresh in this topic.

@@ -0,0 +1,81 @@
; RUN: opt -fpbuiltin-fn-selection -S < %s | FileCheck %s
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put this test in CodeGen as Andy was adding his tests for fpbuiltin there (including those, which were testing just transformations without invoking llc)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer those tests to be in Transform instead, but I'm not against adding a single new test into CodeGen simply because they should either all be in the right location, or none of them (i.e. the move should be done as a separate PR).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree, lets keep the tests in one place for now. If necessary we will move some of them to Transform dir

// Lets map them on NVPTX intrinsics. If no appropriate intrinsics are known
// - skip to replaceWithAltMathFunction.
if (T.isNVPTX() && BuiltinCall.getRequiredAccuracy().value() == 3.0) {
if (replaceWithNVPTXCalls(BuiltinCall))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we somehow encode "skip"/"fallback" to alt math functions part into the name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, from my perspective the code here is not complex and speaks for itself :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point was not in the complexity, but in the expectations.

Let's say you read the this specific function, i.e. you start top-down to understand high-level structure first. Just by looking at the name, you may expect that this specific step will introduce some NVPTX-specific intrinsics, but in fact, it will do some other form of lowering which could be surprising for you unless you also went through the function itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, renamed and clarified in the comment

@@ -106,6 +107,48 @@ static bool replaceWithLLVMIR(FPBuiltinIntrinsic &BuiltinCall) {
return true;
}

static bool replaceWithNVPTXCalls(FPBuiltinIntrinsic &BuiltinCall) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only call this function if requested precision is exactly 3.0, but the name is quite generic. I think that it worth adding a comment right before the function to better specify its intent/expected use case of this function in case it will be extended in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed the function and added the comment

Copy link
Contributor

@AlexeySachkov AlexeySachkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code itself looks good to me, just a few naming/documentation comments.
I'm also unfamiliar with CUDA, so it would be great to hear feedback from @intel/llvm-reviewers-cuda

@zahiraam
Copy link
Contributor

@zahiraam FYI this PR should fix c-math LIT failure on CUDA in #15836 @npmiller @Naghasan please take a look at the PR or assign somebody familiar with CUDA/NVPTX as I'm quite fresh in this topic.

This change makes sense to me. I will wait until this PR is merged in, then will rebase and test again and hopefully the cmath_test.cpp fail will get fixed.

Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
Copy link
Contributor

@frasercrmck frasercrmck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description will need updating to reflect the FTZ changes

llvm/lib/Transforms/Scalar/FPBuiltinFnSelection.cpp Outdated Show resolved Hide resolved
Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
@MrSidims
Copy link
Contributor Author

@jchlanda @frasercrmck @maksimsab hi, following conversation with @jcranmer-intel I've significantly changed the patch since your approval, may I ask you to take another look?

What have changed:
Added lowering for 0.5 aka precise fpbuiltin intrinsics as AltMathLibrary don't handle them for NVPTX/CUDA. Lowering for approx nvvm is now dependent on max-error value and follows https://docs.nvidia.com/cuda/parallel-thread-execution/#floating-point-instructions , so for example for fdiv it starts from 2.0 ULP. Note, that range between 0.5 and 2.0 ULP for fdiv is not covered by this patch.

Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
Copy link
Contributor

@frasercrmck frasercrmck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good, just some questions.

Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
Copy link
Contributor

@jcranmer-intel jcranmer-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last nit:
Can you add a test that uses a rather large fpbuiltin-max-error, say 10 or 100? Just to cover the non-equality cases for the various if statements.

Other than that, this looks good.

Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
@MrSidims MrSidims requested a review from a team January 30, 2025 18:08
@MrSidims
Copy link
Contributor Author

@intel/llvm-gatekeepers please help with merge

@aelovikov-intel aelovikov-intel merged commit 52238e1 into intel:sycl Jan 30, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants