Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra undefined symbols when compiling with flang-new #5131

Closed
Crivella opened this issue Feb 14, 2025 · 10 comments · Fixed by #5138
Closed

Extra undefined symbols when compiling with flang-new #5131

Crivella opened this issue Feb 14, 2025 · 10 comments · Fixed by #5138

Comments

@Crivella
Copy link

I've tried to compile OpenBLAS 0.3.27 and 0.3.29 with LLVM 19.1.1 and 20.1.0-rc1 clang/flang-new.

Everything works fine, test suite included, but there seems to be some extra undefined symbols that are usually not present when compiling with GCC

$ nm -D libopenblas.so | grep _Fort
                 U _FortranAAssign
                 U _FortranACharacterCompareScalar1
                 U _FortranAcpowi
                 U _FortranAExponent4_4
                 U _FortranAExponent8_4
                 U _FortranAzpowi

This is causing a problem in the specific when building numpy 1.26.4 on top of OpenBLAS, which seems to be triggered by the library being dlopened without the RTLD_LAZY flag triggering the undefined symbol error (_FortranACharacterCompareScalar1 in the specific case).

It also seems to be related by the problem reported in #4868, specifically the comment by @martin-frbg #4868 (comment).

Not sure if there are some flags for building with flang-new that I am missing, or if this is not the right venue to report this.

@martin-frbg
Copy link
Collaborator

These appear to belong to flang's runtime library - I am not aware of any compiler flags to embed or suppress them, and OpenBLAS is certainly not trying to call any of those directly. Perhaps asking numpy to change their dlopen flags, or opening an issue ticket in llvm/llvm-project could be useful

@martin-frbg
Copy link
Collaborator

I'm a bit confused - grep'ing for _FortranACharacterCompareScalar1 in the library directory of my "official tarball" installation of LLVM-19.1.7 turns up only a single hit in libFortranRuntime.a, as if this static library was the only linking option. Does yours have a libFortranRuntime.so as well, by any chance ?

@martin-frbg martin-frbg added Support Bug in other software Compiler, Virtual Machine, etc. bug affecting OpenBLAS labels Feb 16, 2025
@martin-frbg
Copy link
Collaborator

I also see only a static libFortranRuntime.a in the binary release tarball of LLVM-20.1-rc2, so my conclusion for now is that you probably built LLVM from source with (possibly default) build options leading to the additional creation of a libFortranRuntime.so that subsequentially becomes the preferential pick for the linker when building any other software with this toolchain. (Building my own binaries of the compiler was certainly what I did at the time of my quoted comment)

@Crivella
Copy link
Author

Thanks a lot for looking into this.

I am indeed compiling LLVM from scratch.
In both installations for 19.1.1 and 20.1.0-rc1 in the lib folders i also only have the static version of the fortran runtime libFortranRuntime.a but not the dynamic one.

I will try to investigate more to understand what is happening under the hood and why this symbols are left as undefined

@martin-frbg
Copy link
Collaborator

Hmm, does your "home-built" libFortranRuntime.a contain any symbol records that start with _Fort and have a "T" on them, in particular any similar to the "missing" ones like _FortranAAssign ? That should at least tell if they're there but (that library is) not included, or if your build of that library is somehow incomplete compared to what the LLVM release managers provide

@Crivella
Copy link
Author

Yes i've checked the symbols are there

$ nm libFortranRuntime.a | grep _FortranAAssign
                 U _FortranAAssignTemporary
0000000000000000 T _FortranAAssign
0000000000000000 T _FortranAAssignExplicitLengthCharacter
0000000000000000 T _FortranAAssignPolymorphic
0000000000000000 T _FortranAAssignTemporary
                 U _FortranAAssignTemporary

i think for some reason the runtime library is not being linked everywhere it should but I've yet to pinpoint why.
If i compile a very simple fortran program

program hello
  ! This is a comment line; it is ignored by the compiler
  print *, 'Hello, World! Fortran'

end program hello

SUBROUTINE BAR(A)
  REAL A(2,3,*)
END

It seems the runtime is being linked properly, have to check what could be causing it not to during the build process here

$ flang -v hw.f90 > flang.log
flang version 20.1.0-rc1
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: ~/.local/easybuild/software/LLVM/20.1.0-rc1/bin
Configuration file: ~/.local/easybuild/software/LLVM/20.1.0-rc1/bin/flang.cfg
Selected GCC installation: ~/.local/easybuild/software/GCCcore/13.3.0/lib/gcc/x86_64-pc-linux-gnu/13.3.0
Candidate multilib: .;@m64
Selected multilib: .;@m64
 "~/.local/easybuild/software/LLVM/20.1.0-rc1/bin/flang" -fc1 -triple x86_64-unknown-linux-gnu -emit-obj -fcolor-diagnostics -mrelocation-model pic -pic-level 2 -pic-is-pie -target-cpu x86-64 -resource-dir ~/.local/easybuild/software/LLVM/20.1.0-rc1/lib/clang/20 -mframe-pointer=all -o /tmp/hw-08a085.o -x f95-cpp-input hw.f90
 "~/.local/easybuild/software/LLVM/20.1.0-rc1/bin/ld.lld" -z relro --hash-style=gnu --eh-frame-hdr -m elf_x86_64 -pie -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o a.out /lib/x86_64-linux-gnu/Scrt1.o /lib/x86_64-linux-gnu/crti.o ~/.local/easybuild/software/LLVM/20.1.0-rc1/lib/clang/20/lib/x86_64-unknown-linux-gnu/clang_rt.crtbegin.o -L~/.local/easybuild/software/LLVM/20.1.0-rc1/bin/../lib/x86_64-unknown-linux-gnu -L~/.local/easybuild/software/LLVM/20.1.0-rc1/lib/clang/20/lib/x86_64-unknown-linux-gnu -L~/.local/easybuild/software/GCCcore/13.3.0/lib/gcc/x86_64-pc-linux-gnu/13.3.0 -L~/.local/easybuild/software/GCCcore/13.3.0/lib/gcc/x86_64-pc-linux-gnu/13.3.0/../../../../lib64 -L/lib/x86_64-linux-gnu -L/lib/../lib64 -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib64 -L/lib -L/usr/lib -L~/.local/easybuild/software/LLVM/20.1.0-rc1/lib/x86_64-unknown-linux-gnu -L~/.local/easybuild/software/LLVM/20.1.0-rc1/lib -L~/.local/easybuild/software/zlib/1.3.1/lib -L~/.local/easybuild/software/ncurses/6.5/lib /tmp/hw-08a085.o -L~/.local/easybuild/software/LLVM/20.1.0-rc1/lib -lFortranRuntime -lFortranDecimal -lm ~/.local/easybuild/software/LLVM/20.1.0-rc1/lib/clang/20/lib/x86_64-unknown-linux-gnu/libclang_rt.builtins.a --as-needed -lunwind --no-as-needed -lc ~/.local/easybuild/software/LLVM/20.1.0-rc1/lib/clang/20/lib/x86_64-unknown-linux-gnu/libclang_rt.builtins.a --as-needed -lunwind --no-as-needed ~/.local/easybuild/software/LLVM/20.1.0-rc1/lib/clang/20/lib/x86_64-unknown-linux-gnu/clang_rt.crtend.o /lib/x86_64-linux-gnu/crtn.o

@martin-frbg
Copy link
Collaborator

Interesting. Could it be something "simple" like whether the final link is initiated from flang or clang ? There is nothing in the build system of OpenBLAS that should expressly remove any reference to libFortranRuntime... btw are you building OpenBLAS with gmake, CMake, or - seeing that you're actually after Numpy - from numpy's Meson scripts ?

@martin-frbg
Copy link
Collaborator

Reproduced as a build difference between gmake and CMake - the gmake build of the shared library did not embed the flang-new runtime as a special build rule for "classic" flang was not applied to flang-new as well. Should be fixed by #5138 now. Note the static library contains only OpenBLAS' own symbols in either case, so always needs additional libraries linked

@martin-frbg martin-frbg removed Support Bug in other software Compiler, Virtual Machine, etc. bug affecting OpenBLAS labels Feb 17, 2025
@Crivella
Copy link
Author

Crivella commented Feb 18, 2025

Thanks,

i can confirm this was indeed the problem

$ nm -D libopenblas.so | grep _FortranAAss
0000000000f8bdf0 T _FortranAAssign
0000000000f8bf40 T _FortranAAssignExplicitLengthCharacter
0000000000f8bf70 T _FortranAAssignPolymorphic
0000000000f8be20 T _FortranAAssignTemporary

$ nm -D libopenblas.so | grep " U " | grep _Fort
...

For the record I am experimenting with a full build of LLVM to see how far the new flang-new can be pushed in order to build a GCC equivalent toolchain up to scientific software.
The problem when building numpy on top of OpenBLAS was actually not from numpy
but thee FlexiBLAS dlopening OpenBLAS as it was set as the default backend.

Weirdly it was not being caught by the build+tests of FlexiBLAS itself, will investigate that.

Thank you very much for looking into this and the quick fix

@martin-frbg
Copy link
Collaborator

Thanks for confirming - I ran a couple of builds with different configurations last night until enlightenment came :)
Regarding LLVM, in the context of OpenBLAS I am only aware of issues with AVX512 on Windows, which are probably of limited relevance to anything like EESSI. Clang seems to be more prone to running out of registers when compiling AVX512 code (at CMake's default -O3 at least) than gcc on Linux however. It will be interesting to see how LLVM20 fares.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants