[AMDGPU] Creating relocatable object (-r) from rdc objects (-fgpu-rdc) fails with lld error attempted static link of dynamic object in /opt/rocm-6.0.0/lib

Hey @arsenm and @jdoerfert, how do I generate a relocatable object (-r) for the amdgpu target? I am linking a large code containing a few millions of lines of C++ with an optional library dependency containing about 300,000 lines of C++. The library requires relocatable device code (-fgpu-rdc) because it has many kernels which reference device functions defined in separate translation units. The large code does not. A driver for the library links in 30 minutes. The large code takes 2 minutes to link without the optional library and over 8 hours with the library (the lld process is still running after 8 hours). I don't want to use rdc to link the large code, but I have to because of the optional library: if even a single object needs rdc, then the link needs it too. Perhaps an intermediate step between compiling the library and linking the large code, in which I generate a relocatable object (-r) from the rdc-compiled library, would allow me to link the large code without rdc even when I'm using the optional library.

# x86+LTO (good)

Consider using LTO to target x86, which works as expected. During compilation, clang -flto emits LLVM IR, which lld uses to perform link time optimizations like cross translation unit inlining. Here is an example:
```
$ cat main.c
int add1(int y);
int main(int argc, char **argv) { return add1(argc); }
$ cat add.c
int add1(int y) { return y + 1; }
$ cat build.sh
dirs="normal_build lto_build relocatable_build"
for dir in $(echo $dirs); do rm -rf $dir; mkdir $dir; done

# build separate compilation executable
dir=normal_build
clang -O2 -c add.c -o $dir/add.o                          # add.o  contains object code
clang -O2 -c main.c -o $dir/main.o                        # main.o contains object code
clang -O2 $dir/add.o $dir/main.o -o $dir/foo              # linker sees object code

# build lto executable
dir=lto_build
clang -flto -O2 -c add.c -o $dir/add.o                    # add.o  contains llvm IR
clang -flto -O2 -c main.c -o $dir/main.o                  # main.o contains llvm IR
clang -flto -O2 $dir/add.o $dir/main.o -o $dir/foo        # linker sees llvm IR

# build lto executable but with an intermediate step between compiling and
# linking which creates relocatable uber.o
dir=relocatable_build 
clang    -flto -O2 -c add.c -o $dir/add.o                 # add.o  contains llvm IR
clang    -flto -O2 -c main.c -o $dir/main.o               # main.o contains llvm IR
clang -r -flto -O2 $dir/add.o $dir/main.o -o $dir/uber.o  # uber.o contains object code
clang          -O2 $dir/uber.o -o $dir/foo                # linker sees object code
```
Building and then disassembling the executables shows that add1, which is referenced and defined in separate translation units, is inlined for the two LTO builds but not for the separate compilation build, as expected:
```
$ sh < build.sh 
$ llvm-objdump --disassemble-symbols=main */foo 
lto_build/foo:  file format elf64-x86-64

Disassembly of section .text:

0000000000400540 <main>:
  400540: 8d 47 01                      leal    0x1(%rdi), %eax
  400543: c3                            retq
  400544: 66 2e 0f 1f 84 00 00 00 00 00 nopw    %cs:(%rax,%rax)
  40054e: 66 90                         nop
  
normal_build/foo:   file format elf64-x86-64

Disassembly of section .text: 

0000000000400550 <main>:
  400550: e9 eb ff ff ff                jmp 0x400540 <add1>
  400555: 66 2e 0f 1f 84 00 00 00 00 00 nopw    %cs:(%rax,%rax)
  40055f: 90                            nop 

relocatable_build/foo:  file format elf64-x86-64

Disassembly of section .text:

0000000000400570 <main>:
  400570: 8d 47 01                      leal    0x1(%rdi), %eax
  400573: c3                            retq
  400574: 66 2e 0f 1f 84 00 00 00 00 00 nopw    %cs:(%rax,%rax)
  40057e: 66 90                         nop
```
The difference in the two LTO builds is that one had -flto on the link line and the other didn't. The one which included an intermediate step between compiling and linking to create a relocatable object did *not* need -flto on the link line because I gave the linker object code, not LLVM IR.

# amdgpu+rdc (bad)

Now consider my use case. I'm building with rocm 6.0.0, the latest rocm clang distribution installed on my system, and I am targeting the amd mi250x. I modified my x86+LTO code to use hip with rdc:
```
$ cat main.c
#include <hip/hip_runtime.h>
__device__ int add1(int y);
__global__ void mykernel(int *y) { *y = add1(*y); } 
int main(int argc, char **argv) {
    mykernel<<<1,1>>>(&argc);
    return argc;
}
$ cat add.c
__device__ int add1(int y) { return y + 1; }
$ cat build.sh
dirs="rdc_build relocatable_build"
for dir in $(echo $dirs); do rm -rf $dir; mkdir $dir; done

dir=rdc_build
hipcc -O2 -fgpu-rdc --offload-arch=gfx90a -x hip -c add.c -o $dir/add.o         # add.o contains llvm IR
hipcc -O2 -fgpu-rdc --offload-arch=gfx90a -x hip -c main.c -o $dir/main.o       # main.o contains llvm IR
hipcc -O2 -fgpu-rdc --offload-arch=gfx90a $dir/add.o $dir/main.o -o $dir/foo    # linker sees llvm IR

dir=relocatable_build
hipcc    -O2 -fgpu-rdc --offload-arch=gfx90a -x hip -c add.c -o $dir/add.o          # add.o contains llvm IR
hipcc    -O2 -fgpu-rdc --offload-arch=gfx90a -x hip -c main.c -o $dir/main.o        # main.o contains llvm IR
hipcc -r -O2 -fgpu-rdc --offload-arch=gfx90a $dir/add.o $dir/main.o -o $dir/uber.o  # uber.o contains object code 
hipcc    -O2           --offload-arch=gfx90a -o $dir/uber.o                         # linker sees object code
```
The second-to-last line, which uses -r to make the relocatable object, fails with `ld.lld: error: attempted static link of dynamic object` and references shared libraries in /opt/rocm:
```
$ sh < build.sh
clang: warning: argument unused during compilation: '--rtlib=compiler-rt' [-Wunused-command-line-argument]
clang: warning: argument unused during compilation: '-unwindlib=libgcc' [-Wunused-command-line-argument]
ld.lld: error: attempted static link of dynamic object /opt/rocm-6.0.0/lib/libamdhip64.so
ld.lld: error: attempted static link of dynamic object /opt/rocm-6.0.0/lib/libhsa-runtime64.so
ld.lld: error: attempted static link of dynamic object /opt/rocm-6.0.0/lib/libamd_comgr.so
ld.lld: error: attempted static link of dynamic object /opt/rocm-6.0.0/lib/libamdhip64.so
clang: error: linker command failed with exit code 1 (use -v to see invocation)
ld.lld: error: undefined symbol: main
>>> referenced by /lib/../lib64/crt1.o:(_start)
clang: error: linker command failed with exit code 1 (use -v to see invocation)
```
Ignore the last 3 lines above, which are due to my attempt to link using the non-existent object file uber.o.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] Creating relocatable object (-r) from rdc objects (-fgpu-rdc) fails with lld error attempted static link of dynamic object in /opt/rocm-6.0.0/lib #77018

x86+LTO (good)

amdgpu+rdc (bad)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[AMDGPU] Creating relocatable object (-r) from rdc objects (-fgpu-rdc) fails with lld error attempted static link of dynamic object in /opt/rocm-6.0.0/lib #77018

Description

x86+LTO (good)

amdgpu+rdc (bad)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions