-
Notifications
You must be signed in to change notification settings - Fork 12.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Flang-new][OpenMP] Add bitcode files for AMD GPU OpenMP #96742
[Flang-new][OpenMP] Add bitcode files for AMD GPU OpenMP #96742
Conversation
@llvm/pr-subscribers-clang-driver @llvm/pr-subscribers-clang Author: Dominik Adamski (DominikAdamski) ChangesFlang-new needs to add fcuda-is-device flag is not used by Flang currently. In the future it will be needed for Flang equivalent function: Full diff: https://github.com/llvm/llvm-project/pull/96742.diff 3 Files Affected:
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index dd55838dcf384..612d5793232ce 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -8016,7 +8016,7 @@ def source_date_epoch : Separate<["-"], "source-date-epoch">,
// CUDA Options
//===----------------------------------------------------------------------===//
-let Visibility = [CC1Option] in {
+let Visibility = [CC1Option, FC1Option] in {
def fcuda_is_device : Flag<["-"], "fcuda-is-device">,
HelpText<"Generate code for CUDA device">,
@@ -8031,7 +8031,7 @@ def fno_cuda_host_device_constexpr : Flag<["-"], "fno-cuda-host-device-constexpr
HelpText<"Don't treat unattributed constexpr functions as __host__ __device__.">,
MarshallingInfoNegativeFlag<LangOpts<"CUDAHostDeviceConstexpr">>;
-} // let Visibility = [CC1Option]
+} // let Visibility = [CC1Option, FC1Option]
//===----------------------------------------------------------------------===//
// OpenMP Options
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp b/clang/lib/Driver/ToolChains/Flang.cpp
index 42b45dba2bd31..2679f284c5016 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -333,6 +333,9 @@ void Flang::AddAMDGPUTargetArgs(const ArgList &Args,
StringRef Val = A->getValue();
CmdArgs.push_back(Args.MakeArgString("-mcode-object-version=" + Val));
}
+
+ const ToolChain &TC = getToolChain();
+ TC.addClangTargetOptions(Args, CmdArgs, Action::OffloadKind::OFK_OpenMP);
}
void Flang::addTargetOptions(const ArgList &Args,
diff --git a/flang/test/Driver/omp-driver-offload.f90 b/flang/test/Driver/omp-driver-offload.f90
index 6fb4f4eeeeca1..b8afbe65961dc 100644
--- a/flang/test/Driver/omp-driver-offload.f90
+++ b/flang/test/Driver/omp-driver-offload.f90
@@ -227,3 +227,11 @@
! FORCE-USM-OFFLOAD-SAME: "-fopenmp" "-fopenmp-force-usm"
! FORCE-USM-OFFLOAD-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa"
! FORCE-USM-OFFLOAD-SAME: "-fopenmp" "-fopenmp-force-usm"
+
+! RUN: %flang -### -v --target=x86_64-unknown-linux-gnu -fopenmp \
+! RUN: --offload-arch=gfx900 \
+! RUN: --rocm-path=%S/Inputs/rocm %s 2>&1 \
+! RUN: | FileCheck --check-prefix=MLINK-BUILTIN-BITCODE %s
+! MLINK-BUILTIN-BITCODE: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa"
+! MLINK-BUILTIN-BITCODE-SAME: "-fcuda-is-device"
+! MLINK-BUILTIN-BITCODE-SAME: "-mlink-builtin-bitcode" {{.*Inputs.*rocm.*amdgcn.*bitcode.*}}oclc_isa_version_900.bc
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that it's called -fcuda-is-device
is historical cruft, but I guess it's easiest to just work with it. I also hate -mlink-builtin-bitcode
as a concept, but we're not quite ready to move away from its hacks unfortunately.
@@ -333,6 +333,9 @@ void Flang::AddAMDGPUTargetArgs(const ArgList &Args, | |||
StringRef Val = A->getValue(); | |||
CmdArgs.push_back(Args.MakeArgString("-mcode-object-version=" + Val)); | |||
} | |||
|
|||
const ToolChain &TC = getToolChain(); | |||
TC.addClangTargetOptions(Args, CmdArgs, Action::OffloadKind::OFK_OpenMP); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be some kind of warning if these flags are used not with AMDGPU? I don't have a strong opinion here as it is only an fc1 flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to having better diagnostics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
thanks for the feedback. I would like to share my observations with you:
- Clang does not verify how we use these flags and it accepts them for non-GPU target.
- These flags can be reused by other vendors. For example clang adds
mlink-builtin-bitcode
option for OpenMP Nvidia GPU as well .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean that this change would also lead to adding these flags when building for Nvidia GPU with flang?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. My change does not imply any changes for Nvidia GPUs support.
Flang and Clang share the same LLVM backend which consumes generated LLVM IR. For AMD GPU we need to embed bitcode definitions of GPU math functions. AMD toolchain adds all required options to the compiler invocation for AMD GPU and IMO can be reused between Flang and Clang.
I don't know if Nvidia also want to reuse their toolchain between Clang and Flang to fully support OpenMP offloading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clang does not verify how we use these flags and it accepts them for non-GPU target.
It's OK to make Flang "stricter" if we believe that's the right thing to do ;-) (I think that generating useful error/warning messages like "don't mix these flags - that's not supporter" would be a good thing)
IMO can be reused between Flang and Clang
Are there any plans to extract that logic and share it somewhere?
I don't know if Nvidia also want to reuse their toolchain between Clang and Flang to fully support OpenMP offloading.
Who could be the right person to ask?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's OK to make Flang "stricter" if we believe that's the right thing to do ;-) (I think that generating useful error/warning messages like "don't mix these flags - that's not supporter" would be a good thing)
Shall I extend #94763 ? I don't use -fcuda-is-device
anymore. Now, I'm only adding -mlink-builtin-bitcode
flags to flang-new -fc1
command. The -mlink-builtin-bitcode
option was introduced by #94763
IMO can be reused between Flang and Clang
Are there any plans to extract that logic and share it somewhere?
Not yet (at least from my side). I can return to this topic if there is need to support Clang option by Flang for AMD GPU.
I don't know if Nvidia also want to reuse their toolchain between Clang and Flang to fully support OpenMP offloading.
Who could be the right person to ask?
I don't know. Open-source LLVM Flang meetings can be good place to ask this question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall I extend #94763 ?
Yes, please.
Who could be the right person to ask?
I don't know. Open-source LLVM Flang meetings can be good place to ask this question.
Did you ask? What feedback did you get?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you ask? What feedback did you get?
I asked question on flang-compiler slack (openmp/openacc channel). If I get no response, I will raise question on Flang technical community call on Monday.
Flang-new needs to add mlink-builtin-bitcode objects to properly support offload code generation for AMD GPU. fcuda-is-device flag is not used by Flang currently. In the future it will be needed for Flang equivalent function: AMDGPUTargetCodeGenInfo::getGlobalVarAddressSpace.
80d4675
to
5b487aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fcuda-is-device flag is not used by Flang currently. In the future it will be needed for Flang equivalent functions: AMDGPUTargetCodeGenInfo::getGlobalVarAddressSpace AMDGPUTargetInfo::getTargetDefines .
I don't follow - why would anything related to CUDA be relevant here?
@@ -333,6 +333,9 @@ void Flang::AddAMDGPUTargetArgs(const ArgList &Args, | |||
StringRef Val = A->getValue(); | |||
CmdArgs.push_back(Args.MakeArgString("-mcode-object-version=" + Val)); | |||
} | |||
|
|||
const ToolChain &TC = getToolChain(); | |||
TC.addClangTargetOptions(Args, CmdArgs, Action::OffloadKind::OFK_OpenMP); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to having better diagnostics
Clang for AMDGPU supports OpenMP and HIP and it reuses the same code. For example |
Thanks! I'm still puzzled though:
Why would
That would be great - what's the plan here then? Simply to rely on the code in Clang? Also, note that that's |
Updated PR after: #96909 .
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is okay. I don't want to block the PR on diagnostics for an FC1 flag which aren't present in clang. LGTM and thanks for answering my questions.
Please also wait for @banach-space's approval.
This reverts commit 8252dbe.
Would it be possible for you to investigate that? It really shouldn't be required if we can't help it. |
@jhuber6 I'm working on that. |
+1 |
@@ -333,6 +333,9 @@ void Flang::AddAMDGPUTargetArgs(const ArgList &Args, | |||
StringRef Val = A->getValue(); | |||
CmdArgs.push_back(Args.MakeArgString("-mcode-object-version=" + Val)); | |||
} | |||
|
|||
const ToolChain &TC = getToolChain(); | |||
TC.addClangTargetOptions(Args, CmdArgs, Action::OffloadKind::OFK_OpenMP); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall I extend #94763 ?
Yes, please.
Who could be the right person to ask?
I don't know. Open-source LLVM Flang meetings can be good place to ask this question.
Did you ask? What feedback did you get?
This reverts commit f2e362e. PR: llvm#99002 causes that -fcuda-is-device is not added for AMD GPU OpenMP.
Fixed in PR #99002 |
@banach-space I asked question on flang-slack, I mentioned the issue on the latest Flang technical meeting and I described potential solution here: https://discourse.llvm.org/t/offloading-on-nvptx64-target-with-flang-new-leads-to-undefined-reference-s/80237 . I got no feedback. Can I merge this PR? The issue with - |
Thanks for following-up! Overall this looks good to me, but please update the summary with some details/context, e.g.
Approving as is, no need to wait for me to take another look. |
What are we relying on |
@jhuber6 You are right. Flang-new for AMD GPU requires |
Flang-new needs to add `mlink-builtin-bitcode` objects to properly support offload code generation for AMD GPUs (for example, math functions). Both Flang-new and Clang rely on `mlink-builtin-bitcode` flags. These flags are added by the `AMDGPUOpenMPToolchain::addClangTargetOptions` function. Now, both compilers reuse the same function. Flang-new tests for AMDGPU were updated by adding the `-nogpulib` flag. This flag allows running AMDGPU tests on machines without the ROCm stack.
Flang-new needs to add
mlink-builtin-bitcode
objects to properly support offload code generation for AMD GPUs (for example, math functions).Both Flang-new and Clang rely on mlink-builtin-bitcode flags. These flags are added by the AMDGPUOpenMPToolchain::addClangTargetOptions function. Now, both compilers reuse the same function.
Flang-new tests for AMDGPU were updated by adding the
-nogpulib
flag. This flag allows running AMDGPU tests on machines without a ROCm stack.