Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flang-new][OpenMP] Add bitcode files for AMD GPU OpenMP #96742

Merged
merged 10 commits into from
Jul 29, 2024
3 changes: 3 additions & 0 deletions clang/lib/Driver/ToolChains/Flang.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -342,6 +342,9 @@ void Flang::AddAMDGPUTargetArgs(const ArgList &Args,
StringRef Val = A->getValue();
CmdArgs.push_back(Args.MakeArgString("-mcode-object-version=" + Val));
}

const ToolChain &TC = getToolChain();
TC.addClangTargetOptions(Args, CmdArgs, Action::OffloadKind::OFK_OpenMP);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be some kind of warning if these flags are used not with AMDGPU? I don't have a strong opinion here as it is only an fc1 flag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to having better diagnostics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,
thanks for the feedback. I would like to share my observations with you:

  1. Clang does not verify how we use these flags and it accepts them for non-GPU target.
  2. These flags can be reused by other vendors. For example clang adds mlink-builtin-bitcode option for OpenMP Nvidia GPU as well .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean that this change would also lead to adding these flags when building for Nvidia GPU with flang?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. My change does not imply any changes for Nvidia GPUs support.

Flang and Clang share the same LLVM backend which consumes generated LLVM IR. For AMD GPU we need to embed bitcode definitions of GPU math functions. AMD toolchain adds all required options to the compiler invocation for AMD GPU and IMO can be reused between Flang and Clang.

I don't know if Nvidia also want to reuse their toolchain between Clang and Flang to fully support OpenMP offloading.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clang does not verify how we use these flags and it accepts them for non-GPU target.

It's OK to make Flang "stricter" if we believe that's the right thing to do ;-) (I think that generating useful error/warning messages like "don't mix these flags - that's not supporter" would be a good thing)

IMO can be reused between Flang and Clang

Are there any plans to extract that logic and share it somewhere?

I don't know if Nvidia also want to reuse their toolchain between Clang and Flang to fully support OpenMP offloading.

Who could be the right person to ask?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's OK to make Flang "stricter" if we believe that's the right thing to do ;-) (I think that generating useful error/warning messages like "don't mix these flags - that's not supporter" would be a good thing)

Shall I extend #94763 ? I don't use -fcuda-is-device anymore. Now, I'm only adding -mlink-builtin-bitcode flags to flang-new -fc1 command. The -mlink-builtin-bitcode option was introduced by #94763

IMO can be reused between Flang and Clang

Are there any plans to extract that logic and share it somewhere?

Not yet (at least from my side). I can return to this topic if there is need to support Clang option by Flang for AMD GPU.

I don't know if Nvidia also want to reuse their toolchain between Clang and Flang to fully support OpenMP offloading.

Who could be the right person to ask?

I don't know. Open-source LLVM Flang meetings can be good place to ask this question.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall I extend #94763 ?

Yes, please.

Who could be the right person to ask?

I don't know. Open-source LLVM Flang meetings can be good place to ask this question.

Did you ask? What feedback did you get?

Copy link
Contributor Author

@DominikAdamski DominikAdamski Jul 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you ask? What feedback did you get?

I asked question on flang-compiler slack (openmp/openacc channel). If I get no response, I will raise question on Flang technical community call on Monday.

}

void Flang::addTargetOptions(const ArgList &Args,
Expand Down
39 changes: 23 additions & 16 deletions flang/test/Driver/omp-driver-offload.f90
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,12 @@
! Test regular -fopenmp with offload, and invocation filtering options
! RUN: %flang -S -### %s -o %t 2>&1 \
! RUN: -fopenmp --offload-arch=gfx90a --offload-arch=sm_70 \
! RUN: --target=aarch64-unknown-linux-gnu \
! RUN: --target=aarch64-unknown-linux-gnu -nogpulib\
! RUN: | FileCheck %s --check-prefix=OFFLOAD-HOST-AND-DEVICE

! RUN: %flang -S -### %s -o %t 2>&1 \
! RUN: -fopenmp --offload-arch=gfx90a --offload-arch=sm_70 --offload-host-device \
! RUN: --target=aarch64-unknown-linux-gnu \
! RUN: --target=aarch64-unknown-linux-gnu -nogpulib\
! RUN: | FileCheck %s --check-prefix=OFFLOAD-HOST-AND-DEVICE

! OFFLOAD-HOST-AND-DEVICE: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu"
Expand All @@ -29,7 +29,7 @@

! RUN: %flang -S -### %s -o %t 2>&1 \
! RUN: -fopenmp --offload-arch=gfx90a --offload-arch=sm_70 --offload-host-only \
! RUN: --target=aarch64-unknown-linux-gnu \
! RUN: --target=aarch64-unknown-linux-gnu -nogpulib\
! RUN: | FileCheck %s --check-prefix=OFFLOAD-HOST

! OFFLOAD-HOST: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu"
Expand All @@ -39,7 +39,7 @@

! RUN: %flang -S -### %s 2>&1 \
! RUN: -fopenmp --offload-arch=gfx90a --offload-arch=sm_70 --offload-device-only \
! RUN: --target=aarch64-unknown-linux-gnu \
! RUN: --target=aarch64-unknown-linux-gnu -nogpulib\
! RUN: | FileCheck %s --check-prefix=OFFLOAD-DEVICE

! OFFLOAD-DEVICE: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu"
Expand All @@ -48,13 +48,13 @@
! OFFLOAD-DEVICE-NOT: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu"

! Test regular -fopenmp with offload for basic fopenmp-is-target-device flag addition and correct fopenmp
! RUN: %flang -### -fopenmp --offload-arch=gfx90a -fopenmp-targets=amdgcn-amd-amdhsa %s 2>&1 | FileCheck --check-prefixes=CHECK-OPENMP-IS-TARGET-DEVICE %s
! RUN: %flang -### -fopenmp --offload-arch=gfx90a -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib %s 2>&1 | FileCheck --check-prefixes=CHECK-OPENMP-IS-TARGET-DEVICE %s
! CHECK-OPENMP-IS-TARGET-DEVICE: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" {{.*}}.f90"

! Testing appropriate flags are gnerated and appropriately assigned by the driver when offloading
! RUN: %flang -S -### %s -o %t 2>&1 \
! RUN: -fopenmp --offload-arch=gfx90a \
! RUN: --target=aarch64-unknown-linux-gnu \
! RUN: --target=aarch64-unknown-linux-gnu -nogpulib\
! RUN: | FileCheck %s --check-prefix=OPENMP-OFFLOAD-ARGS
! OPENMP-OFFLOAD-ARGS: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" {{.*}} "-fopenmp" {{.*}}.f90"
! OPENMP-OFFLOAD-ARGS-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa"
Expand All @@ -70,19 +70,19 @@
! RUN: %flang -### %s -o %t 2>&1 \
! RUN: -fopenmp --offload-arch=gfx90a \
! RUN: -fopenmp-targets=amdgcn-amd-amdhsa \
! RUN: -fopenmp-assume-threads-oversubscription \
! RUN: -fopenmp-assume-threads-oversubscription -nogpulib \
! RUN: | FileCheck %s --check-prefixes=CHECK-THREADS-OVS
! RUN: %flang -### %s -o %t 2>&1 \
! RUN: -fopenmp --offload-arch=sm_70 \
! RUN: -fopenmp-targets=nvptx64-nvidia-cuda \
! RUN: -fopenmp-assume-threads-oversubscription \
! RUN: -fopenmp-assume-threads-oversubscription \
! RUN: | FileCheck %s --check-prefixes=CHECK-THREADS-OVS
! CHECK-THREADS-OVS: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-threads-oversubscription" {{.*}}.f90"

! RUN: %flang -### %s -o %t 2>&1 \
! RUN: -fopenmp --offload-arch=gfx90a \
! RUN: -fopenmp-targets=amdgcn-amd-amdhsa \
! RUN: -fopenmp-assume-teams-oversubscription \
! RUN: -fopenmp-assume-teams-oversubscription -nogpulib\
! RUN: | FileCheck %s --check-prefixes=CHECK-TEAMS-OVS
! RUN: %flang -### %s -o %t 2>&1 \
! RUN: -fopenmp --offload-arch=sm_70 \
Expand All @@ -94,7 +94,7 @@
! RUN: %flang -### %s -o %t 2>&1 \
! RUN: -fopenmp --offload-arch=gfx90a \
! RUN: -fopenmp-targets=amdgcn-amd-amdhsa \
! RUN: -fopenmp-assume-no-nested-parallelism \
! RUN: -fopenmp-assume-no-nested-parallelism -nogpulib\
! RUN: | FileCheck %s --check-prefixes=CHECK-NEST-PAR
! RUN: %flang -### %s -o %t 2>&1 \
! RUN: -fopenmp --offload-arch=sm_70 \
Expand All @@ -106,7 +106,7 @@
! RUN: %flang -### %s -o %t 2>&1 \
! RUN: -fopenmp --offload-arch=gfx90a \
! RUN: -fopenmp-targets=amdgcn-amd-amdhsa \
! RUN: -fopenmp-assume-no-thread-state \
! RUN: -fopenmp-assume-no-thread-state -nogpulib\
! RUN: | FileCheck %s --check-prefixes=CHECK-THREAD-STATE
! RUN: %flang -### %s -o %t 2>&1 \
! RUN: -fopenmp --offload-arch=sm_70 \
Expand All @@ -118,7 +118,7 @@
! RUN: %flang -### %s -o %t 2>&1 \
! RUN: -fopenmp --offload-arch=gfx90a \
! RUN: -fopenmp-targets=amdgcn-amd-amdhsa \
! RUN: -fopenmp-target-debug \
! RUN: -fopenmp-target-debug -nogpulib\
! RUN: | FileCheck %s --check-prefixes=CHECK-TARGET-DEBUG
! RUN: %flang -### %s -o %t 2>&1 \
! RUN: -fopenmp --offload-arch=sm_70 \
Expand All @@ -130,7 +130,7 @@
! RUN: %flang -### %s -o %t 2>&1 \
! RUN: -fopenmp --offload-arch=gfx90a \
! RUN: -fopenmp-targets=amdgcn-amd-amdhsa \
! RUN: -fopenmp-target-debug \
! RUN: -fopenmp-target-debug -nogpulib\
! RUN: | FileCheck %s --check-prefixes=CHECK-TARGET-DEBUG
! RUN: %flang -### %s -o %t 2>&1 \
! RUN: -fopenmp --offload-arch=sm_70 \
Expand All @@ -144,7 +144,7 @@
! RUN: -fopenmp-targets=amdgcn-amd-amdhsa \
! RUN: -fopenmp-target-debug -fopenmp-assume-threads-oversubscription \
! RUN: -fopenmp-assume-teams-oversubscription -fopenmp-assume-no-nested-parallelism \
! RUN: -fopenmp-assume-no-thread-state \
! RUN: -fopenmp-assume-no-thread-state -nogpulib\
! RUN: | FileCheck %s --check-prefixes=CHECK-RTL-ALL
! RUN: %flang -S -### %s -o %t 2>&1 \
! RUN: -fopenmp --offload-arch=sm_70 \
Expand All @@ -160,7 +160,7 @@
! RUN: %flang -### %s -o %t 2>&1 \
! RUN: -fopenmp --offload-arch=gfx90a \
! RUN: -fopenmp-targets=amdgcn-amd-amdhsa \
! RUN: -fopenmp-version=45 \
! RUN: -fopenmp-version=45 -nogpulib\
! RUN: | FileCheck %s --check-prefixes=CHECK-OPENMP-VERSION
! RUN: %flang -### %s -o %t 2>&1 \
! RUN: -fopenmp --offload-arch=sm_70 \
Expand Down Expand Up @@ -193,10 +193,17 @@
! Test -fopenmp-force-usm option with offload
! RUN: %flang -S -### %s -o %t 2>&1 \
! RUN: -fopenmp -fopenmp-force-usm --offload-arch=gfx90a \
! RUN: --target=aarch64-unknown-linux-gnu \
! RUN: --target=aarch64-unknown-linux-gnu -nogpulib\
! RUN: | FileCheck %s --check-prefix=FORCE-USM-OFFLOAD

! FORCE-USM-OFFLOAD: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu"
! FORCE-USM-OFFLOAD-SAME: "-fopenmp" "-fopenmp-force-usm"
! FORCE-USM-OFFLOAD-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa"
! FORCE-USM-OFFLOAD-SAME: "-fopenmp" "-fopenmp-force-usm"

! RUN: %flang -### -v --target=x86_64-unknown-linux-gnu -fopenmp \
! RUN: --offload-arch=gfx900 \
! RUN: --rocm-path=%S/Inputs/rocm %s 2>&1 \
! RUN: | FileCheck --check-prefix=MLINK-BUILTIN-BITCODE %s
! MLINK-BUILTIN-BITCODE: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa"
! MLINK-BUILTIN-BITCODE-SAME: "-mlink-builtin-bitcode" {{.*Inputs.*rocm.*amdgcn.*bitcode.*}}oclc_isa_version_900.bc
4 changes: 2 additions & 2 deletions flang/test/Driver/target-cpu-features.f90
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,10 @@
! RUN: %flang --target=riscv64-linux-gnu -c %s -### 2>&1 \
! RUN: | FileCheck %s -check-prefix=CHECK-RV64

! RUN: %flang --target=amdgcn-amd-amdhsa -mcpu=gfx908 -c %s -### 2>&1 \
! RUN: %flang --target=amdgcn-amd-amdhsa -mcpu=gfx908 -nogpulib -c %s -### 2>&1 \
! RUN: | FileCheck %s -check-prefix=CHECK-AMDGPU

! RUN: %flang --target=r600-unknown-unknown -mcpu=cayman -c %s -### 2>&1 \
! RUN: %flang --target=r600-unknown-unknown -mcpu=cayman -nogpulib -c %s -### 2>&1 \
! RUN: | FileCheck %s -check-prefix=CHECK-AMDGPU-R600

! CHECK-A57: "-fc1" "-triple" "aarch64-unknown-linux-gnu"
Expand Down
2 changes: 1 addition & 1 deletion flang/test/Driver/target-gpu-features.f90
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
! Test that -mcpu are used and that the -target-cpu and -target-features
! are also added to the fc1 command.

! RUN: %flang --target=amdgcn-amd-amdhsa -mcpu=gfx902 -c %s -### 2>&1 \
! RUN: %flang --target=amdgcn-amd-amdhsa -mcpu=gfx902 -nogpulib -c %s -### 2>&1 \
! RUN: | FileCheck %s -check-prefix=CHECK-AMDGCN

! CHECK-AMDGCN: "-fc1" "-triple" "amdgcn-amd-amdhsa"
Expand Down