Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong dynamic library linking order when compiling with MKL FFT libraries #958

Closed
markus-meier74 opened this issue Mar 22, 2023 · 13 comments

Comments

@markus-meier74
Copy link

Hi, I get the following linking error when compiling Relion with -DMKLFFT=ON:

FAILED: bin/relion_display
: && /usr/bin/mpicxx -march=native -mfpmath=sse -O3 -pipe -DNDEBUG -fopenmp -std=c++11 -Wl,-O1 -Wl,--as-needed -rdynamic src/apps/CMakeFiles/display.dir/display.cpp.o -o bin/relion_display -L/opt/intel/oneapi/mkl/2022.2.1/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core /opt/cuda/lib64/libcufft.so /opt/cuda/lib64/libcufft.so -lmpicxx -lmpi -ldl lib64/librelion_gui_lib.so /usr/lib64/fltk/libfltk_images.so /usr/lib64/fltk/libfltk_forms.so /usr/lib64/fltk/libfltk.so -lSM -lICE -lX11 -lXext -lm -ltiff lib64/librelion_lib.so lib64/librelion_gpu_util.so /opt/cuda/lib64/libcurand.so /opt/cuda/lib64/libcufft.so -ltiff lib64/librelion_jaz_gpu_util.so /opt/cuda/lib64/libcudart_static.a -ldl -Wl,-Bstatic -lrt -Wl,-Bdynamic -lpng -ljpeg && :
/usr/lib/gcc/x86_64-pc-linux-gnu/11/../../../../x86_64-pc-linux-gnu/bin/ld: lib64/librelion_lib.so: undefined reference to fftw_init_threads' /usr/lib/gcc/x86_64-pc-linux-gnu/11/../../../../x86_64-pc-linux-gnu/bin/ld: lib64/librelion_lib.so: undefined reference to fftwf_plan_dft_r2c'
/usr/lib/gcc/x86_64-pc-linux-gnu/11/../../../../x86_64-pc-linux-gnu/bin/ld: lib64/librelion_lib.so: undefined reference to fftw_plan_dft_c2r' /usr/lib/gcc/x86_64-pc-linux-gnu/11/../../../../x86_64-pc-linux-gnu/bin/ld: lib64/librelion_gui_lib.so: undefined reference to fftw_malloc'
/usr/lib/gcc/x86_64-pc-linux-gnu/11/../../../../x86_64-pc-linux-gnu/bin/ld: lib64/librelion_lib.so: undefined reference to fftw_execute_dft_r2c' /usr/lib/gcc/x86_64-pc-linux-gnu/11/../../../../x86_64-pc-linux-gnu/bin/ld: lib64/librelion_lib.so: undefined reference to fftw_plan_dft_r2c'
/usr/lib/gcc/x86_64-pc-linux-gnu/11/../../../../x86_64-pc-linux-gnu/bin/ld: lib64/librelion_gui_lib.so: undefined reference to fftw_free' /usr/lib/gcc/x86_64-pc-linux-gnu/11/../../../../x86_64-pc-linux-gnu/bin/ld: lib64/librelion_lib.so: undefined reference to fftwf_execute_dft_c2r'
/usr/lib/gcc/x86_64-pc-linux-gnu/11/../../../../x86_64-pc-linux-gnu/bin/ld: lib64/librelion_lib.so: undefined reference to fftw_execute_dft_c2r' /usr/lib/gcc/x86_64-pc-linux-gnu/11/../../../../x86_64-pc-linux-gnu/bin/ld: lib64/librelion_lib.so: undefined reference to fftw_plan_with_nthreads'
/usr/lib/gcc/x86_64-pc-linux-gnu/11/../../../../x86_64-pc-linux-gnu/bin/ld: lib64/librelion_lib.so: undefined reference to fftwf_cleanup' /usr/lib/gcc/x86_64-pc-linux-gnu/11/../../../../x86_64-pc-linux-gnu/bin/ld: lib64/librelion_lib.so: undefined reference to fftwf_execute_dft_r2c'
/usr/lib/gcc/x86_64-pc-linux-gnu/11/../../../../x86_64-pc-linux-gnu/bin/ld: lib64/librelion_lib.so: undefined reference to fftwf_plan_dft' /usr/lib/gcc/x86_64-pc-linux-gnu/11/../../../../x86_64-pc-linux-gnu/bin/ld: lib64/librelion_lib.so: undefined reference to fftw_cleanup_threads'
/usr/lib/gcc/x86_64-pc-linux-gnu/11/../../../../x86_64-pc-linux-gnu/bin/ld: lib64/librelion_lib.so: undefined reference to fftwf_plan_dft_c2r' /usr/lib/gcc/x86_64-pc-linux-gnu/11/../../../../x86_64-pc-linux-gnu/bin/ld: lib64/librelion_lib.so: undefined reference to fftw_destroy_plan'
/usr/lib/gcc/x86_64-pc-linux-gnu/11/../../../../x86_64-pc-linux-gnu/bin/ld: lib64/librelion_lib.so: undefined reference to `fftwf_destroy_plan'
collect2: error: ld returned 1 exit status

The reason is cyclic dependency. librelion_lib.so also depends on libmkl_intel_lp64.so, but -lmkl_intel_lp64 is given before lib64/librelion_lib.so on the command line.

Here is a patch to src/apps/CMakeLists.txt that puts the libraries into the right order and fixed the problem for me:
relion_MKLFFT.patch.txt

With best regards,
Markus

@biochem-fan
Copy link
Member

biochem-fan commented Mar 24, 2023

Did you source mkl_vars.sh (or set_vars.sh) before running cmake with -DMKLFFT=ON?

The _vars script sets CPATH and LIBRARY_PATH to MKL, so FFTW related variables should not be necessary.

Which compiler are you using?

@markus-meier74
Copy link
Author

Sorry for the late reply. I am compiling inside an automated build environment (Gentoo ebuild). MKL is a system library and is correctly detected by cmake (not using any helper scripts such as mkl_vars.sh).

I am using gcc:
"""
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-pc-linux-gnu/11/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /var/tmp/portage/sys-devel/gcc-11.3.1_p20230120-r1/work/gcc-11-20230120/configure --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/11 --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/11/include --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/11 --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/11/man --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/11/info --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/11/include/g++-v11 --with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/11/python --enable-languages=c,c++,fortran --enable-obsolete --enable-secureplt --disable-werror --with-system-zlib --enable-nls --without-included-gettext --disable-libunwind-exceptions --enable-checking=release --with-bugurl=https://bugs.gentoo.org/ --with-pkgversion='Gentoo 11.3.1_p20230120-r1 p7' --with-gcc-major-version-only --disable-esp --enable-libstdcxx-time --disable-libstdcxx-pch --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --enable-multilib --with-multilib-list=m32,m64 --disable-fixed-point --enable-targets=all --enable-libgomp --disable-libssp --disable-libada --disable-cet --disable-systemtap --disable-valgrind-annotations --disable-vtable-verify --disable-libvtv --with-zstd --enable-lto --without-isl --enable-default-pie --enable-default-ssp --with-build-config=bootstrap-lto
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.3.1 20230120 (Gentoo 11.3.1_p20230120-r1 p7)
"""
I have attached my CMakeCache.txt:
CMakeCache.txt

With best regards,
Markus

@biochem-fan
Copy link
Member

MKL is a system library

Where is it located?

@markus-meier74
Copy link
Author

markus-meier74 commented Apr 4, 2023

/opt/intel/oneapi/mkl/2022.2.1

The following environmental variables are set system-wide (/etc/env.d/70intel-mkl):
MKLROOT="/opt/intel/oneapi/mkl/2022.2.1"
PATH="/opt/intel/oneapi/mkl/2022.2.1/bin/intel64"
ROOTPATH="/opt/intel/oneapi/mkl/2022.2.1/bin/intel64"
LDPATH="/opt/intel/oneapi/mkl/2022.2.1/lib/intel64"

@biochem-fan
Copy link
Member

I don't think these variables fully mimic what set_vars.sh does.
Don't you have the script in /opt/intel/oneapi/bin or somewhere?

@markus-meier74
Copy link
Author

There is /opt/intel/oneapi/mkl/2022.2.1/env/vars.sh (attached)
vars.sh.txt

The simple patch I provided with my bug report solved all issues for me and I don't have to rely on any additional scripts. I thought I would share it with the community.

@do-jason
Copy link
Collaborator

do-jason commented Apr 5, 2023

There is existing FFTW_LIBRARIES link like below.

if(NOT MKLFFT)
target_link_libraries(${_target} ${LIB} ${EXTRA_LIBS} ${MPI_LIBRARIES} ${CMAKE_DL_LIBS})
else()
target_link_libraries(${_target} ${LIB} ${FFTW_LIBRARIES} ${EXTRA_LIBS} ${MPI_LIBRARIES} ${CMAKE_DL_LIBS})
endif(NOT MKLFFT)

Could you (@markus-meier74 ) provide the below information to reproduce your problem?

  • Used compiler, compiler version, library, library version.
  • Your own added environment variables if any.
  • Used cmake command line and its output.

@markus-meier74
Copy link
Author

@do-jason: That's correct, but the statement that the relion libraries ("relion_lib") also depend on ${FFTW_LIBRARIES} is missed, because this statement is inside a if(NOT MKLFFT) ... endif(NOT MKLFFT) block:

if(NOT MKLFFT)
target_link_libraries(relion_lib ${FFTW_LIBRARIES})
if(BUILD_OWN_FFTW)
add_dependencies(relion_lib own_fftw_lib)
endif()
if(BUILD_OWN_FFTWF)
add_dependencies(relion_lib own_fftwf_lib)
endif()
endif(NOT MKLFFT)

Just move "target_link_libraries(relion_lib ${FFTW_LIBRARIES})" outside this block and everything works as it should.

Will provide the requested info as soon as it can be contrieved.

@biochem-fan
Copy link
Member

When we link to MKLFFT, our assumption is that MKL include/linking paths set by the vars script are enough and that
FFTW_LIBRARIES is NOT necessary. This is why the line is within if(NOT MKLFFT).
Of course you can choose not to use the script, but it is not our intended way and I don't consider it is RELION's bug.

@markus-meier74
Copy link
Author

@biochem-fan: Not correct, as your colleague do-jason just pointed out, your CMakeLists.txt does use FFTW_LIBRARIES for linking with MKL, on line 345:

if(NOT MKLFFT)
target_link_libraries(${_target} ${LIB} ${EXTRA_LIBS} ${MPI_LIBRARIES} ${CMAKE_DL_LIBS})
else()
target_link_libraries(${_target} ${LIB} ${FFTW_LIBRARIES} ${EXTRA_LIBS} ${MPI_LIBRARIES} ${CMAKE_DL_LIBS})
endif(NOT MKLFFT)

So it is a Relion bug in my opinion. The vars script is unlikely to set the correct order of the libraries as it works outside cmake, and this linking error may randomly occur.

@markus-meier74
Copy link
Author

I did some more investigating. I get the same linking error even if I source /opt/intel/oneapi/mkl/2022.2.1/env/vars.sh before building. The bug is triggered by the "--as-needed" linker flag which is appended by default in the Gentoo ebuild environment.

Here are the steps to reproduce the error on your system (bash script) and all the information you were asking for:
build_relion.bash.txt

cmake configuration log:
cmake_configure.log

Ninja build log:
ninja_build.log
build.ninja.txt

CMakeCache:
CMakeCache.txt

Build environment:
build_environment.txt

System informatioin:
system_info.txt

Libraries with version info:
relion_dependency_graph.txt

gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-pc-linux-gnu/11/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /var/tmp/portage/sys-devel/gcc-11.3.1_p20230120-r1/work/gcc-11-20230120/configure --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/11 --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/11/include --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/11 --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/11/man --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/11/info --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/11/include/g++-v11 --with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/11/python --enable-languages=c,c++,fortran --enable-obsolete --enable-secureplt --disable-werror --with-system-zlib --enable-nls --without-included-gettext --disable-libunwind-exceptions --enable-checking=release --with-bugurl=https://bugs.gentoo.org/ --with-pkgversion='Gentoo 11.3.1_p20230120-r1 p7' --with-gcc-major-version-only --disable-esp --enable-libstdcxx-time --disable-libstdcxx-pch --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --enable-multilib --with-multilib-list=m32,m64 --disable-fixed-point --enable-targets=all --enable-libgomp --disable-libssp --disable-libada --disable-cet --disable-systemtap --disable-valgrind-annotations --disable-vtable-verify --disable-libvtv --with-zstd --enable-lto --without-isl --enable-default-pie --enable-default-ssp --with-build-config=bootstrap-lto
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.3.1 20230120 (Gentoo 11.3.1_p20230120-r1 p7)

@do-jason
Copy link
Collaborator

do-jason commented Apr 6, 2023

@biochem-fan @markus-meier74 I have tested it and found that the problem occurs when shared library build is on (-DBUILD_SHARED_LIBS=ON) and MKL is enabled for non-Intel compiler. FFTW_LIBRARIES is set by cmake when own FFTW is needed or MKL is enabled without Intel compiler. I have tested it with Intel compiler with MKL, and the given patch also worked fine.
The below change will always try to link relion_lib with FFTW_LIBRARIES, and this will be empty string for Intel compiler/MKL environment and have some MKL library or FFTW library in other cases.

target_link_libraries(relion_lib ${FFTW_LIBRARIES})
if(NOT MKLFFT)
    if(BUILD_OWN_FFTW)
        add_dependencies(relion_lib own_fftw_lib)
    endif()
    if(BUILD_OWN_FFTWF)
        add_dependencies(relion_lib own_fftwf_lib)
    endif()
endif(NOT MKLFFT)

biochem-fan added a commit that referenced this issue Apr 6, 2023
linker flag (e.g. in Gentoo), as reported in issue #958.

Thanks to @markus-meier74 and @do-jason for the report and
investigation.
@biochem-fan
Copy link
Member

biochem-fan commented Apr 6, 2023

@markus-meier74, @do-jason

Thank you very much for investigation. I understood.
I made your suggested change to the ver4.0 branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants