You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[AMDGPU] Creating relocatable object (-r) from rdc objects (-fgpu-rdc) fails with lld error undefined protected symbol referenced by __clang_gpu_used_external
#88551
Open
pozulp opened this issue
Apr 12, 2024
· 2 comments
Hi @yxsamliu, I'm using clang -r to lower bitcode in objects that I compiled -fgpu-rdc to AMDGPU device code before link-time so that I can link without -fgpu-rdc, as I described in #77018. I am hitting lld error undefined protected symbol due to an extern device variable. I reference the variable in host code, but not device code, so why does that stop me from lowering the bitcode? I found your commit which created the __clang_gpu_used_external symbol that appears in my error message, and I read your phabricator discussion, but I still don't understand why you have to "Emit a global array containing all external kernels or device variables used by host functions and mark it as used..." (see your CodeGenModule.cpp comment). Here's a tiny reproducer:
Just want to explain the issue as I understand it here and Sam can correct me if I'm wrong. The compiler builds host and device code in distinct stages but there is no communication between the stages. The reason for the issue you are seeing is that the device code toolchain needs information about what host functions are doing for the reasons I explained earlier today, but relies on what it sees during device compilation to generate this information, e.g. what kernels are launched ect.
Because of these restrictions it cannot understand that the preprocessor may change the input host code between the compilation stages. All it sees is that you are actually incrementing a device variable globalfoo from a host function. If globalfoo was actually unused i.e. you remove the increment, and it is only declared when the device compilation stage parses the file then you would have no linking issues.
I can't think of an easy fix without major changes to the way the toolchain works W.R.T. rdc but maybe Sam can expand on that?
Right now I believe you would need to make sure your code agrees on which device variables or functions are referenced with and without HIP_DEVICE_COMPILE.
As @kerbowa has explained, HIP has separate compilations for host and device targets. If preprocessor expansion are different in host and device compilations, the compiler only sees the expanded code for each compilation.
In host compilation of alpha1.c, globalfoo is a normal host extern variable. However, such a host variable is not defined in any host code, which will result in undefined symbol when linking host objects. (For HIP/CUDA, a definition of device variable does not result in a host variable with the same name in host object file.)
In device compilation of alpha1.c, globalfoo is declared as an extern device variable, since it is used by host code, it is added to __clang_gpu_used_external to force generating a reference in device bitcode, otherwise the device bitcode is just empty. However, globalfoo is not defined in any device bitcode, therefore the device linker will emit an error about undefined symbol.
Another issue is that you cannot read/write an ordinary device variable in host function. If you want to read/write a device variable in host code, you need to use managed variable and define it consistently for host/device compilation, e.g.
Hi @yxsamliu, I'm using clang -r to lower bitcode in objects that I compiled -fgpu-rdc to AMDGPU device code before link-time so that I can link without -fgpu-rdc, as I described in #77018. I am hitting lld error undefined protected symbol due to an extern device variable. I reference the variable in host code, but not device code, so why does that stop me from lowering the bitcode? I found your commit which created the __clang_gpu_used_external symbol that appears in my error message, and I read your phabricator discussion, but I still don't understand why you have to "Emit a global array containing all external kernels or device variables used by host functions and mark it as used..." (see your CodeGenModule.cpp comment). Here's a tiny reproducer:
The text was updated successfully, but these errors were encountered: