You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I'm working on a Fedora package for dbcsr. I'm getting test failures with mpich on s390x.
To Reproduce
/usr/bin/ctest --test-dir redhat-linux-build-mpich --output-on-failure --force-new-ctest-process -j3
Internal ctest changing into directory: /builddir/build/BUILD/dbcsr-2.6.0/redhat-linux-build-mpich
Test project /builddir/build/BUILD/dbcsr-2.6.0/redhat-linux-build-mpich
Start 1: dbcsr_perf:inputs/test_H2O.perf
Start 2: dbcsr_perf:inputs/test_rect1_dense.perf
Start 3: dbcsr_perf:inputs/test_rect1_sparse.perf
1/19 Test #3: dbcsr_perf:inputs/test_rect1_sparse.perf ..............***Failed 2.10 sec
DBCSR| CPU Multiplication driver BLAS (D)
DBCSR| Multrec recursion limit 512 (D)
DBCSR| Multiplication stack size 1000 (D)
DBCSR| Maximum elements for images UNLIMITED (D)
DBCSR| Multiplicative factor virtual images 1 (D)
DBCSR| Use multiplication densification T (D)
DBCSR| Multiplication size stacks 3 (D)
DBCSR| Use memory pool for CPU allocation F (D)
DBCSR| Number of 3D layers SINGLE (D)
DBCSR| Use MPI memory allocation F (D)
DBCSR| Use RMA algorithm F (U)
DBCSR| Use Communication thread T (D)
DBCSR| Communication thread load 100 (D)
DBCSR| MPI: My process id 0
DBCSR| MPI: Number of processes 2
DBCSR| OMP: Current number of threads 2
DBCSR| OMP: Max number of threads 2
DBCSR| Split modifier for TAS multiplication algorithm 1.0E+00 (D)
numthreads 2
numnodes 2
matrix_sizes 5000 1000 1000
sparsities 0.90000000000000002 0.90000000000000002 0.90000000000000002
trans NN
symmetries NNN
type 3
alpha_in 1.0000000000000000 0.0000000000000000
beta_in 1.0000000000000000 0.0000000000000000
limits 1 5000 1 1000 1 1000
retain_sparsity F
nrep 10
bs_m 1 5
bs_n 1 5
bs_k 1 5
*******************************************************************************
* MPI error 5843983 in mpi_barrier @ mp_sync : Other MPI error, *
* error stack:
internal_Barrier(84).......................: *
* MPI_Barrier(comm=0x84000001) *
* ___ failed
MPID_Barrier(167)..........................: *
* / \
MPIDI_Barrier_allcomm_composition_json(132): *
* [ABORT]
MPIDI_POSIX_mpi_bcast(219).................: *
* \___/
MPIDI_POSIX_mpi_bcast_release_gather(132)..: *
* |
MPIDI_POSIX_mpi_release_gather_release(218): message sizes do not *
* O/| match across processes in the collective routine: Received 0 but *
* /| | expected 1 *
* / \ dbcsr_mpiwrap.F:1186 *
*******************************************************************************
===== Routine Calling Stack =====
4 mp_sync
3 perf_multiply
2 dbcsr_perf_multiply_low
1 dbcsr_performance_driver
Abort(1) on node 1 (rank 1 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1
STOP 1
I don't see test failures with openmpi. One difference is that mpich is being built with -DUSE_MPI_F08=ON.
Environment:
Operating system & version
Fedora Rawhide
Compiler vendor & version
gcc 13.2.1
Build environment (make or cmake)
cmake
Configuration of DBCSR (either the cmake flags or the Makefile.inc) /usr/bin/cmake -S . -B redhat-linux-build-mpich -DCMAKE_C_FLAGS_RELEASE:STRING=-DNDEBUG -DCMAKE_CXX_FLAGS_RELEASE:STRING=-DNDEBUG -DCMAKE_Fortran_FLAGS_RELEASE:STRING=-DNDEBUG -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON -DCMAKE_INSTALL_DO_STRIP:BOOL=OFF -DCMAKE_INSTALL_PREFIX:PATH=/usr -DINCLUDE_INSTALL_DIR:PATH=/usr/include -DLIB_INSTALL_DIR:PATH=/usr/lib64 -DSYSCONF_INSTALL_DIR:PATH=/etc -DSHARE_INSTALL_PREFIX:PATH=/usr/share -DLIB_SUFFIX=64 -DBUILD_SHARED_LIBS:BOOL=ON -DCMAKE_INSTALL_Fortran_MODULES=/usr/lib64/gfortran/modules/mpich -DUSE_MPI_F08=ON -DCMAKE_PREFIX_PATH:PATH=/usr/lib64/mpich -DCMAKE_INSTALL_PREFIX:PATH=/usr/lib64/mpich -DCMAKE_INSTALL_LIBDIR:PATH=lib -- The C compiler identification is GNU 13.2.1
MPI implementation and version
mpich 4.1.2
If CUDA is being used: CUDA version and GPU architecture
No CUDA
BLAS/LAPACK implementation and version
flexiblas 3.3.1 -> openblas 0.3.21
The text was updated successfully, but these errors were encountered:
I've realized that we are not testing with MPI_F08 in our CI, however we did a test here #661 (comment) and it worked. the only difference was GCC 13.1. I will add the test to the CI. In the meantime, I see some actions here:
could you build with F08 and OpenMPI?
any chance you can use GCC 13.1 and mpich with F08 in DBCSR?
Describe the bug
I'm working on a Fedora package for dbcsr. I'm getting test failures with mpich on s390x.
To Reproduce
I don't see test failures with openmpi. One difference is that mpich is being built with
-DUSE_MPI_F08=ON
.Environment:
Fedora Rawhide
gcc 13.2.1
cmake
Makefile.inc
)/usr/bin/cmake -S . -B redhat-linux-build-mpich -DCMAKE_C_FLAGS_RELEASE:STRING=-DNDEBUG -DCMAKE_CXX_FLAGS_RELEASE:STRING=-DNDEBUG -DCMAKE_Fortran_FLAGS_RELEASE:STRING=-DNDEBUG -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON -DCMAKE_INSTALL_DO_STRIP:BOOL=OFF -DCMAKE_INSTALL_PREFIX:PATH=/usr -DINCLUDE_INSTALL_DIR:PATH=/usr/include -DLIB_INSTALL_DIR:PATH=/usr/lib64 -DSYSCONF_INSTALL_DIR:PATH=/etc -DSHARE_INSTALL_PREFIX:PATH=/usr/share -DLIB_SUFFIX=64 -DBUILD_SHARED_LIBS:BOOL=ON -DCMAKE_INSTALL_Fortran_MODULES=/usr/lib64/gfortran/modules/mpich -DUSE_MPI_F08=ON -DCMAKE_PREFIX_PATH:PATH=/usr/lib64/mpich -DCMAKE_INSTALL_PREFIX:PATH=/usr/lib64/mpich -DCMAKE_INSTALL_LIBDIR:PATH=lib -- The C compiler identification is GNU 13.2.1
mpich 4.1.2
No CUDA
flexiblas 3.3.1 -> openblas 0.3.21
The text was updated successfully, but these errors were encountered: