Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] Creating relocatable object (-r) from rdc objects (-fgpu-rdc) fails with lld error undefined protected symbol referenced by __clang_gpu_used_external #88551

Open
pozulp opened this issue Apr 12, 2024 · 2 comments
Labels

Comments

@pozulp
Copy link
Member

pozulp commented Apr 12, 2024

Hi @yxsamliu, I'm using clang -r to lower bitcode in objects that I compiled -fgpu-rdc to AMDGPU device code before link-time so that I can link without -fgpu-rdc, as I described in #77018. I am hitting lld error undefined protected symbol due to an extern device variable. I reference the variable in host code, but not device code, so why does that stop me from lowering the bitcode? I found your commit which created the __clang_gpu_used_external symbol that appears in my error message, and I read your phabricator discussion, but I still don't understand why you have to "Emit a global array containing all external kernels or device variables used by host functions and mark it as used..." (see your CodeGenModule.cpp comment). Here's a tiny reproducer:

$ make
clang -O2 -fgpu-rdc --offload-arch=gfx942 -x hip -c -o alpha1.o alpha1.c
clang -O2 -fgpu-rdc --offload-arch=gfx942 -x hip -c -o alpha2.o alpha2.c
clang -r --hip-link -O2 -fgpu-rdc --offload-arch=gfx942 -o alpha_no_bitcode.o alpha1.o alpha2.o
lld: error: undefined protected symbol: globalfoo
>>> referenced by /var/tmp/pozulp1/alpha1-gfx942-637988.out.lto.o:(__clang_gpu_used_external)
clang: error: amdgcn-link command failed with exit code 1 (use -v to see invocation)
make: *** [makefile:12: alpha_no_bitcode.o] Error 1
$ cat alpha1.c
#include <hip/hip_runtime.h>
#if defined(__HIP_DEVICE_COMPILE__)
extern __device__ int globalfoo;
#else
extern int globalfoo;
#endif
__device__ int alpha_add_impl(int x);
__global__ void alpha_kernel(int *x) { *x = alpha_add_impl(*x); }
void alpha_add(int *x) {
    alpha_kernel<<<1,1>>>(x);
    globalfoo++;
}
$ cat alpha2.c
__device__ int alpha_add_impl(int x) { return x + 1; }
$ cat makefile
COMPILER=/path/to/clang
COMPILE_FLAGS=-O2 -fgpu-rdc --offload-arch=gfx942 -x hip
RELOCATABLE_FLAGS=-r --hip-link -no-hip-rt -O2 -fgpu-rdc --offload-arch=gfx942

all: alpha_no_bitcode.o

.c.o:
    $(COMPILER) $(COMPILE_FLAGS) -c -o $@ $<

alpha_no_bitcode.o: alpha1.o alpha2.o
    $(COMPILER) $(RELOCATABLE_FLAGS) -o $@ $^

clean:
    rm -f *.o
@github-actions github-actions bot added the lld label Apr 12, 2024
@kerbowa
Copy link
Member

kerbowa commented Apr 19, 2024

Just want to explain the issue as I understand it here and Sam can correct me if I'm wrong. The compiler builds host and device code in distinct stages but there is no communication between the stages. The reason for the issue you are seeing is that the device code toolchain needs information about what host functions are doing for the reasons I explained earlier today, but relies on what it sees during device compilation to generate this information, e.g. what kernels are launched ect.

Because of these restrictions it cannot understand that the preprocessor may change the input host code between the compilation stages. All it sees is that you are actually incrementing a device variable globalfoo from a host function. If globalfoo was actually unused i.e. you remove the increment, and it is only declared when the device compilation stage parses the file then you would have no linking issues.

I can't think of an easy fix without major changes to the way the toolchain works W.R.T. rdc but maybe Sam can expand on that?

Right now I believe you would need to make sure your code agrees on which device variables or functions are referenced with and without HIP_DEVICE_COMPILE.

@yxsamliu
Copy link
Collaborator

As @kerbowa has explained, HIP has separate compilations for host and device targets. If preprocessor expansion are different in host and device compilations, the compiler only sees the expanded code for each compilation.

In host compilation of alpha1.c, globalfoo is a normal host extern variable. However, such a host variable is not defined in any host code, which will result in undefined symbol when linking host objects. (For HIP/CUDA, a definition of device variable does not result in a host variable with the same name in host object file.)

In device compilation of alpha1.c, globalfoo is declared as an extern device variable, since it is used by host code, it is added to __clang_gpu_used_external to force generating a reference in device bitcode, otherwise the device bitcode is just empty. However, globalfoo is not defined in any device bitcode, therefore the device linker will emit an error about undefined symbol.

Another issue is that you cannot read/write an ordinary device variable in host function. If you want to read/write a device variable in host code, you need to use managed variable and define it consistently for host/device compilation, e.g.

$ cat alpha1.c
#include <hip/hip_runtime.h>
__managed__ int globalfoo;
__device__ int alpha_add_impl(int x);
__global__ void alpha_kernel(int *x) { *x = alpha_add_impl(*x); }
void alpha_add(int *x) {
    alpha_kernel<<<1,1>>>(x);
    globalfoo++;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants