-
Notifications
You must be signed in to change notification settings - Fork 753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][NVPTX] Set default fdiv and sqrt for llvm.fpbuiltin #16714
Conversation
…x-error We are lacking implementation for llvm.fpbuiltin intrinsics for NVPTX target. This patch adds type-and fast-math- dependent mapping for llvm.fpbuiltin.fdiv and llvm.fpbuiltin.sqrt with 3.0 max-error on nvvm intrinsics: fp32 scalar @llvm.fpbuiltin.fdiv -> @llvm.nvvm.div.approx.f fp32 scalar @llvm.fpbuiltin.fdiv fast -> @llvm.nvvm.div.approx.ftz.f fp32 scalar @llvm.fpbuiltin.sqrt -> @llvm.nvvm.sqrt.approx.f fp32 scalar @llvm.fpbuiltin.sqrt fast -> @llvm.nvvm.sqrt.approx.ftz.f vector or non-fp32 scalar llvm.fpbuiltin.fdiv -> fdiv vector or non-fp32 scalar llvm.fpbuiltin.sqrt -> llvm.sqrt Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
@@ -0,0 +1,81 @@ | |||
; RUN: opt -fpbuiltin-fn-selection -S < %s | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put this test in CodeGen as Andy was adding his tests for fpbuiltin there (including those, which were testing just transformations without invoking llc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer those tests to be in Transform
instead, but I'm not against adding a single new test into CodeGen
simply because they should either all be in the right location, or none of them (i.e. the move should be done as a separate PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree, lets keep the tests in one place for now. If necessary we will move some of them to Transform
dir
// Lets map them on NVPTX intrinsics. If no appropriate intrinsics are known | ||
// - skip to replaceWithAltMathFunction. | ||
if (T.isNVPTX() && BuiltinCall.getRequiredAccuracy().value() == 3.0) { | ||
if (replaceWithNVPTXCalls(BuiltinCall)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we somehow encode "skip"/"fallback" to alt math functions part into the name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, from my perspective the code here is not complex and speaks for itself :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point was not in the complexity, but in the expectations.
Let's say you read the this specific function, i.e. you start top-down to understand high-level structure first. Just by looking at the name, you may expect that this specific step will introduce some NVPTX-specific intrinsics, but in fact, it will do some other form of lowering which could be surprising for you unless you also went through the function itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, renamed and clarified in the comment
@@ -106,6 +107,48 @@ static bool replaceWithLLVMIR(FPBuiltinIntrinsic &BuiltinCall) { | |||
return true; | |||
} | |||
|
|||
static bool replaceWithNVPTXCalls(FPBuiltinIntrinsic &BuiltinCall) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only call this function if requested precision is exactly 3.0
, but the name is quite generic. I think that it worth adding a comment right before the function to better specify its intent/expected use case of this function in case it will be extended in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed the function and added the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code itself looks good to me, just a few naming/documentation comments.
I'm also unfamiliar with CUDA, so it would be great to hear feedback from @intel/llvm-reviewers-cuda
This change makes sense to me. I will wait until this PR is merged in, then will rebase and test again and hopefully the |
Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR description will need updating to reflect the FTZ changes
@jchlanda @frasercrmck @maksimsab hi, following conversation with @jcranmer-intel I've significantly changed the patch since your approval, may I ask you to take another look? What have changed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good, just some questions.
llvm/test/CodeGen/NVPTX/fp-builtin-intrinsics-nvvm-max-error-0.5.ll
Outdated
Show resolved
Hide resolved
llvm/test/CodeGen/NVPTX/fp-builtin-intrinsics-nvvm-max-error-0.5.ll
Outdated
Show resolved
Hide resolved
Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last nit:
Can you add a test that uses a rather large fpbuiltin-max-error, say 10 or 100? Just to cover the non-equality cases for the various if statements.
Other than that, this looks good.
@intel/llvm-gatekeepers please help with merge |
AltMathLibrary is lacking implementation for llvm.fpbuiltin intrinsics
for NVPTX target. This patch adds type-dependent mapping for
llvm.fpbuiltin.fdiv with max-error > 2.0 and llvm.fpbuiltin.sqrt with
max-error > 1.0 on nvvm intrinsics:
fp32 scalar @llvm.fpbuiltin.fdiv -> @llvm.nvvm.div.approx.f
fp32 scalar @llvm.fpbuiltin.sqrt -> @llvm.nvvm.sqrt.approx.f
vector or non-fp32 scalar llvm.fpbuiltin.fdiv -> fdiv
vector or non-fp32 scalar llvm.fpbuiltin.sqrt -> llvm.sqrt
Additionally it maps max-error=0.5 fpbuiltin.fadd, fpbuiltin.fsub.
fpbuiltin.fmul, fpbuiltin.fdiv, fpbuiltin.frem, fpbuiltin.sqrt and
fpbuiltin.ldexp intrinsic functions of LLVM's math operations or
https://llvm.org/docs/LangRef.html#standard-c-c-library-intrinsics
TODO in future patches:
Signed-off-by: Sidorov, Dmitry dmitry.sidorov@intel.com