You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When compiled with clang++ and linked with libomp, stedc_solve stochastically fails if OMP_NUM_THREADS > 1. I originally thought that it might be an accidental double linkage with gomp through the Fortran linker, but on inspecting the compiler and linker lines, no such issue.
Steps To Reproduce
Build SLATE with clang / libomp
Run tests with OMP_NUM_THREADS > 1
$ OMP_NUM_THREADS=1 python3 run_tests.py --syev
<...>
--------------------------------------------------------------------------------
All routines passed
$ OMP_NUM_THREADS=2 python3 run_tests.py --syev
<...>
./tester --origin s --target t --ref n --nb 64,100 --type s,d,c,z --lookahead 1 --dim 100:500:100 --jobz v --method-eig qr,dc heev
% SLATE version 2023.08.25, id 57ea922b
% input: ./tester --origin s --target t --ref n --nb 64,100 --type s,d,c,z --lookahead 1 --dim 100:500:100 --jobz v --method-eig qr,dc heev
% 2023-08-27 21:37:32, 1 MPI ranks, CPU-only MPI, 2 OpenMP threads per MPI rank
type origin target eig A jobz uplo n nb ib p q la pt value err back err Z orth. time (s) ref time (s) status
s scalpk task qr 1 vec lower 100 64 32 1 1 1 1 NA 2.74e-08 1.44e-07 0.0125 NA pass
s scalpk task qr 1 vec lower 100 100 32 1 1 1 1 NA 1.46e-08 1.42e-07 0.00620 NA pass
s scalpk task qr 1 vec lower 200 64 32 1 1 1 1 NA 2.37e-08 1.50e-07 0.0452 NA pass
s scalpk task qr 1 vec lower 200 100 32 1 1 1 1 NA 1.08e-08 1.40e-07 0.0385 NA pass
s scalpk task qr 1 vec lower 300 64 32 1 1 1 1 NA 3.22e-08 1.42e-07 0.114 NA pass
s scalpk task qr 1 vec lower 300 100 32 1 1 1 1 NA 1.35e-08 1.37e-07 0.113 NA pass
s scalpk task qr 1 vec lower 400 64 32 1 1 1 1 NA 9.17e-09 1.28e-07 0.237 NA pass
s scalpk task qr 1 vec lower 400 100 32 1 1 1 1 NA 2.78e-08 1.26e-07 0.232 NA pass
s scalpk task qr 1 vec lower 500 64 32 1 1 1 1 NA 1.55e-08 1.24e-07 0.421 NA pass
s scalpk task qr 1 vec lower 500 100 32 1 1 1 1 NA 1.61e-08 1.35e-07 0.431 NA pass
tester: /application/slate/src/stedc_solve.cc:120: void slate::stedc_solve(std::vector<real_t> &, std::vector<real_t> &, Matrix<real_t> &, Matrix<real_t> &, Matrix<real_t> &, const slate::Options &) [real_t = float]: Assertion `Qii.mb() == ib' failed.
[76d71bce518f:00035] *** Process received signal ***
[76d71bce518f:00035] Signal: Aborted (6)
[76d71bce518f:00035] Signal code: (-6)
[76d71bce518f:00035] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f650ad5c520]
[76d71bce518f:00035] [ 1] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f650adb0a7c]
[76d71bce518f:00035] [ 2] /lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7f650ad5c476]
[76d71bce518f:00035] [ 3] /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7f650ad427f3]
[76d71bce518f:00035] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x2871b)[0x7f650ad4271b]
[76d71bce518f:00035] [ 5] /lib/x86_64-linux-gnu/libc.so.6(+0x39e96)[0x7f650ad53e96]
[76d71bce518f:00035] [ 6] /application/build_slate/libslate.so(+0xc3c79e)[0x7f650d3b979e]
[76d71bce518f:00035] [ 7] /lib/x86_64-linux-gnu/libomp.so.5(+0x6156c)[0x7f650bdda56c]
[76d71bce518f:00035] [ 8] /lib/x86_64-linux-gnu/libomp.so.5(+0x653b2)[0x7f650bdde3b2]
[76d71bce518f:00035] [ 9] /lib/x86_64-linux-gnu/libomp.so.5(+0x72f90)[0x7f650bdebf90]
[76d71bce518f:00035] [10] /lib/x86_64-linux-gnu/libomp.so.5(+0x6e5ea)[0x7f650bde75ea]
[76d71bce518f:00035] [11] /lib/x86_64-linux-gnu/libomp.so.5(+0x7257e)[0x7f650bdeb57e]
[76d71bce518f:00035] [12] /lib/x86_64-linux-gnu/libomp.so.5(+0x44d3d)[0x7f650bdbdd3d]
[76d71bce518f:00035] [13] /lib/x86_64-linux-gnu/libomp.so.5(+0xa29f4)[0x7f650be1b9f4]
[76d71bce518f:00035] [14] /lib/x86_64-linux-gnu/libc.so.6(+0x94b43)[0x7f650adaeb43]
[76d71bce518f:00035] [15] /lib/x86_64-linux-gnu/libc.so.6(clone+0x44)[0x7f650ae3fbb4]
[76d71bce518f:00035] *** End of error message ***
FAILED: heev, exit code -6
<...>
./tester --origin s --target t --ref n --nb 64,100 --dim 100:500:100 stedc
tester: /application/slate/src/stedc_solve.cc:120: void slate::stedc_solve(std::vector<real_t> &, std::vector<real_t> &, Matrix<real_t> &, Matrix<real_t> &, Matrix<real_t> &, const slate::Options &) [real_t = double]: Assertion `Qii.mb() == ib' failed.
[76d71bce518f:00095] *** Process received signal ***
[76d71bce518f:00095] Signal: Aborted (6)
[76d71bce518f:00095] Signal code: (-6)
[76d71bce518f:00095] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f0d0b6ce520]
[76d71bce518f:00095] [ 1] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f0d0b722a7c]
[76d71bce518f:00095] [ 2] /lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7f0d0b6ce476]
[76d71bce518f:00095] [ 3] /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7f0d0b6b47f3]
[76d71bce518f:00095] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x2871b)[0x7f0d0b6b471b]
[76d71bce518f:00095] [ 5] /lib/x86_64-linux-gnu/libc.so.6(+0x39e96)[0x7f0d0b6c5e96]
[76d71bce518f:00095] [ 6] /application/build_slate/libslate.so(+0xc3cc9e)[0x7f0d0dd2bc9e]
[76d71bce518f:00095] [ 7] /lib/x86_64-linux-gnu/libomp.so.5(+0x6156c)[0x7f0d0c74c56c]
[76d71bce518f:00095] [ 8] /lib/x86_64-linux-gnu/libomp.so.5(+0x653b2)[0x7f0d0c7503b2]
[76d71bce518f:00095] [ 9] /lib/x86_64-linux-gnu/libomp.so.5(+0x72f90)[0x7f0d0c75df90]
[76d71bce518f:00095] [10] /lib/x86_64-linux-gnu/libomp.so.5(+0x6e5ea)[0x7f0d0c7595ea]
[76d71bce518f:00095] [11] /lib/x86_64-linux-gnu/libomp.so.5(+0x7257e)[0x7f0d0c75d57e]
[76d71bce518f:00095] [12] /lib/x86_64-linux-gnu/libomp.so.5(+0x44d3d)[0x7f0d0c72fd3d]
[76d71bce518f:00095] [13] /lib/x86_64-linux-gnu/libomp.so.5(+0xa29f4)[0x7f0d0c78d9f4]
[76d71bce518f:00095] [14] /lib/x86_64-linux-gnu/libc.so.6(+0x94b43)[0x7f0d0b720b43]
[76d71bce518f:00095] [15] /lib/x86_64-linux-gnu/libc.so.6(clone+0x44)[0x7f0d0b7b1bb4]
[76d71bce518f:00095] *** End of error message ***
FAILED: stedc, exit code -6
Environment
I've also attached a Dockerfile to reproduce the build environment.
# Dockerfile
FROM ubuntu:22.04
RUN apt update && \
apt install -y locales && \
locale-gen "en_US.UTF-8" && \
update-locale LANG=en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LANG en_US.UTF-8
ENV LC_ALL en_US.UTF-8
WORKDIR /application
# Base Environment
RUN apt -y update && apt -y install make wget curl \
lsb-release coreutils sudo bash-completion \
apt-transport-https software-properties-common \
ca-certificates gnupg linux-tools-common time pciutils \
build-essential wget curl \
git make ninja-build \
gdb valgrind \
libeigen3-dev \
libblas-dev liblapack-dev liblapacke-dev \
libunwind-dev libtbb-dev libomp-dev \
libopenmpi-dev openmpi-bin libscalapack-openmpi-dev
# CMake + Clang
RUN apt -y install cmake cmake-curses-gui
RUN apt -y install clang-12 libomp-12-dev
# Clone SLATE
RUN git clone --recurse-submodules https://github.com/icl-utk-edu/slate.git
RUN git -C slate checkout 57ea922b4a10876ba990a41648590ef36019acdd
# Build BLASPP
RUN cmake -S slate/blaspp -B build_blaspp -DCMAKE_C_COMPILER=clang-12 -DCMAKE_CXX_COMPILER=clang++-12
RUN cmake --build build_blaspp --target blaspp -j2
# Build LAPACKPP
RUN cmake -S slate/lapackpp -B build_lapackpp -DCMAKE_C_COMPILER=clang-12 -DCMAKE_CXX_COMPILER=clang++-12 -Dblaspp_DIR=$PWD/build_blaspp
RUN cmake --build build_lapackpp --target lapackpp -j2
# Build SLATE
RUN cmake -S slate -B build_slate -DCMAKE_CXX_COMPILER=clang++-12 -Dblaspp_DIR=$PWD/build_blaspp -Dlapackpp_DIR=$PWD/build_lapackpp -DBUILD_TESTING=ON -DSCALAPACK_LIBRARIES="/usr/lib/x86_64-linux-gnu/libscalapack-openmpi.so"
RUN cmake --build build_slate --target all -j2 --verbose
SLATE version / commit ID (e.g., git log --oneline -n 1): 57ea922
Description
When compiled with
clang++
and linked withlibomp
,stedc_solve
stochastically fails ifOMP_NUM_THREADS > 1
. I originally thought that it might be an accidental double linkage withgomp
through the Fortran linker, but on inspecting the compiler and linker lines, no such issue.Steps To Reproduce
clang
/libomp
OMP_NUM_THREADS > 1
Environment
I've also attached a
Dockerfile
to reproduce the build environment.git log --oneline -n 1
): 57ea922make.inc
)mpicxx --version
):nvcc --version
): N/Ampicxx -v
gives info.): Open MPIThe text was updated successfully, but these errors were encountered: