Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SYCL passes depends on AMDGPU Remove Incompatible Functions unexpectedly #10816

Closed
jsji opened this issue Aug 14, 2023 · 6 comments · Fixed by #11247
Closed

SYCL passes depends on AMDGPU Remove Incompatible Functions unexpectedly #10816

jsji opened this issue Aug 14, 2023 · 6 comments · Fixed by #11247
Assignees
Labels
bug Something isn't working hip Issues related to execution on HIP backend.

Comments

@jsji
Copy link
Contributor

jsji commented Aug 14, 2023

Describe the bug

In Pulldown #10783 test, we met large number of failures on AMDGPU.
eg:
https://github.com/intel/llvm/actions/runs/5849224567/job/15857444183

Example failures are:

FAIL: SYCL :: Assert/assert_in_one_kernel_ndebug.cpp (19 of 1599)
******************** TEST 'SYCL :: Assert/assert_in_one_kernel_ndebug.cpp' FAILED ********************
Script:

: 'RUN: at line 1'; /__w/llvm/llvm/toolchain/bin//clang++ -Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=gfx1031 -fsycl -fsycl-targets=amdgcn-amd-amdhsa -DNDEBUG /__w/llvm/llvm/llvm/sycl/test-e2e/Assert/assert_in_one_kernel.cpp -o /__w/llvm/llvm/build-e2e/Assert/Output/assert_in_one_kernel_ndebug.cpp.tmp.out
: 'RUN: at line 2'; env ONEAPI_DEVICE_SELECTOR=ext_oneapi_hip:gpu /__w/llvm/llvm/build-e2e/Assert/Output/assert_in_one_kernel_ndebug.cpp.tmp.out | /__w/llvm/llvm/toolchain/bin/FileCheck /__w/llvm/llvm/llvm/sycl/test-e2e/Assert/assert_in_one_kernel_ndebug.cpp

Exit Code: 2

Command Output (stdout):

$ ":" "RUN: at line 1"
note: command had no output on stdout or stderr
$ "/__w/llvm/llvm/toolchain/bin//clang++" "-Xsycl-target-backend=amdgcn-amd-amdhsa" "--offload-arch=gfx1031" "-fsycl" "-fsycl-targets=amdgcn-amd-amdhsa" "-DNDEBUG" "/__w/llvm/llvm/llvm/sycl/test-e2e/Assert/assert_in_one_kernel.cpp" "-o" "/__w/llvm/llvm/build-e2e/Assert/Output/assert_in_one_kernel_ndebug.cpp.tmp.out"
note: command had no output on stdout or stderr
$ ":" "RUN: at line 2"
note: command had no output on stdout or stderr
$ "env" "ONEAPI_DEVICE_SELECTOR=ext_oneapi_hip:gpu" "/__w/llvm/llvm/build-e2e/Assert/Output/assert_in_one_kernel_ndebug.cpp.tmp.out"
note: command had no output on stdout or stderr
error: command failed with exit status: -11
$ "/__w/llvm/llvm/toolchain/bin/FileCheck" "/__w/llvm/llvm/llvm/sycl/test-e2e/Assert/assert_in_one_kernel_ndebug.cpp"
# command stderr:
FileCheck error: '' is empty.
FileCheck command line: /__w/llvm/llvm/toolchain/bin/FileCheck /__w/llvm/llvm/llvm/sycl/test-e2e/Assert/assert_in_one_kernel_ndebug.cpp

error: command failed with exit status: 2

To Reproduce

Merge 5b5bd81 into sycl branch or merge the pulldown PR.

Environment (please complete the following information):

  • OS: Linux
  • Target device and vendor: AMDGPU

Additional context

According the the pass manager, it is highly possible that the following two SYCL passes somehow depends on "AMDGPU Remove Incompatible Functions" to remove some bad functions.

; GCN-O1-NEXT: SYCL Local Accessor to Shared Memory
; GCN-O1-NEXT: SYCL Add Implicit Global Offset
; GCN-O1-NEXT: FunctionPass Manager
; GCN-O1-NEXT: FPBuiltin Function Selection
; GCN-O1-NEXT: AMDGPU Remove Incompatible Functions

@jsji jsji added the bug Something isn't working label Aug 14, 2023
@bader bader added the hip Issues related to execution on HIP backend. label Aug 14, 2023
@npmiller npmiller assigned jchlanda and unassigned npmiller Aug 15, 2023
@jchlanda
Copy link
Contributor

I have a branch that exhibits the behaviour at:
https://github.com/jchlanda/llvm/tree/jakub/pulldown_plus_5b5bd81b7170

I also reduced the test-case to an llc friendly IR:

llc -march=amdgcn -mcpu=gfx1030 reduced.ll
target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8"
target triple = "amdgcn-amd-amdhsa"

define internal fastcc float @__ocml_tanh_f32() {
  ret float 0.000000e+00
}

None of SYCL passes are to be blamed, with both local accessor and global offset disabled the tests still crashes. Seems like SYCL requires the AMDGPURemoveIncompatibleFunctions pass to be run later on (was move to IRPasses, needs to be in CoedeGenPrepare stage).

Investigating why that is.

@jchlanda
Copy link
Contributor

jchlanda commented Aug 17, 2023

This looks like git gone wrong, see the difference in:

The SYCL version incorrectly removes AMDGPUAttributor pass here: jchlanda@449f5b9#diff-d1700f27f8ad877aead2ce4581ce77e67e96e0c30d3b551a222dca9fed629419L1077

@jchlanda
Copy link
Contributor

@jsji I've verified that reinstating the AMDGPUAttributor pass fixes the miscompilation. You should be able to resolve the pulldown issue by re-visiting the commit in question and correctly applying it.

Feel free to reopen this ticket/submit a new one should new failure surface.

Thank you.

@jchlanda
Copy link
Contributor

It was a cherry pick gone wrong on my side, re-opening to investigate further.

@jchlanda jchlanda reopened this Aug 17, 2023
@jchlanda
Copy link
Contributor

If we take as an example math tests and only focus on a single invocation of tanh for float4.

The call to tanh is handled by spirv's __spirv_ocl_tanh(float vector[4]), which is implemented in libclc as a sequence of scalar calls to __ocml_tanh_f32 (provided by AMD's device lib implementation).

Now, because the AMD's target machine pass pipeline has been reordered it runs AMDGPURemoveIncompatibleFunctions early on, and as __spirv_ocl_tanh(float vector[4]) requires (among others) target support of FeatureDot3Inst, which is not provided on the GPU in question, it is being earmarked for deletion and we end up with modules containing calls to deleted funcs:

%call3.i.i.i = tail call fastcc noundef <4 x float> null(<4 x float> noundef %agg.tmp.sroa.0.0.copyload.i)

As all the libclc functions are marked with always inline, it used to work well, as we would always replace the problematic vector function calls with exploded scalar versions of __ocml_..., this was handled for us by always inliner pass.

I will reach out to Matt asking for reasoning behind the incompatible functions pass reshuffle, in the meantime there is nothing we can do, apart for making sure that AlwaysInlinerLegacyPass is run prior to AMDGPURemoveIncompatibleFunctions.

@jchlanda
Copy link
Contributor

A link to the discussion on the original patch: https://reviews.llvm.org/D155987#4598269

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working hip Issues related to execution on HIP backend.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants