-
-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tests segfaulting on Ookami #684
Comments
The problems with threading in MVAPICH2 are probably a red herring, with diff --git a/src/environment.jl b/src/environment.jl
index a9d7e40..c889ae3 100644
--- a/src/environment.jl
+++ b/src/environment.jl
@@ -78,7 +78,7 @@ it after calling [`MPI.Finalize`](@ref).
$(_doc_external("MPI_Init"))
$(_doc_external("MPI_Init_thread"))
"""
-function Init(;threadlevel=:serialized, finalize_atexit=true, errors_return=true)
+function Init(;threadlevel=:single, finalize_atexit=true, errors_return=true)
if threadlevel isa Symbol
threadlevel = ThreadLevel(threadlevel)
end to force single-thread initialisation (is there a better way to do that? In the tests the call is always
|
This comment was marked as off-topic.
This comment was marked as off-topic.
For the record, MPICH_jll and OpenMPI_jll work fine on Ookami, all tests pass. I'm trying to make a small reproducer for the failing tests with system libraries. |
It's sufficient to do using MPI
MPI.Init(;threadlevel=:single)
comm = MPI.COMM_WORLD
rank = MPI.Comm_rank(comm)
A = Array{Char}([rank + 1])
C = MPI.Allgather(A, comm) to reproduce the segfault in |
Am I doing anything ostensibly wrong in #include <mpi.h>
#include <stdlib.h>
int main(void)
{
MPI_Init(NULL, NULL);
int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
int size;
MPI_Comm_size(MPI_COMM_WORLD, &size);
char A = rank + 97;
char *C = (char *)malloc(sizeof(char) * size);
MPI_Allgather(&A, 1, MPI_CHAR, C, size, MPI_CHAR, MPI_COMM_WORLD);
free(C);
MPI_Finalize();
return 0;
} ? With this code, which should be pretty much a C equivalent of the Julia code above, I get [mosgiordano@fj003 temp-env]$ module purge
[mosgiordano@fj003 temp-env]$ module load slurm gcc/11.1.0 mvapich2/gcc11/2.3.6
[mosgiordano@fj003 temp-env]$ mpicc -o repro repro.c && srun -n 6 ./repro
[fj003:mpi_rank_1][error_sighandler] Caught error: Segmentation fault (signal 11)
[fj003:mpi_rank_2][error_sighandler] Caught error: Segmentation fault (signal 11)
[fj003:mpi_rank_3][error_sighandler] Caught error: Segmentation fault (signal 11)
[fj003:mpi_rank_4][error_sighandler] Caught error: Segmentation fault (signal 11)
[fj003:mpi_rank_5][error_sighandler] Caught error: Segmentation fault (signal 11)
[fj003:mpi_rank_0][error_sighandler] Caught error: Segmentation fault (signal 11)
srun: error: fj003: tasks 0,2-5: Segmentation fault (core dumped)
srun: error: fj003: task 1: Segmentation fault (core dumped) Edit: nevermind, I got |
@giordano If the problem is related to |
|
I think you got it, this code works: using MPI
MPI.Init(;threadlevel=:single)
comm = MPI.COMM_WORLD
size = MPI.Comm_size(comm)
rank = MPI.Comm_rank(comm)
A = Array{Char}([rank + 1])
C = Char.(zeros(Int32, size))
MPI.Allgather!(MPI.Buffer(A, 1, MPI.Datatype(Int32)),
MPI.UBuffer(C, 1, nothing, MPI.Datatype(Int32)), comm)
@show rank, C $ srun -n 2 julia --project repro.jl
(rank, C) = (0, ['\x01', '\x02'])
(rank, C) = (1, ['\x01', '\x02']) Where is the datatype converted to the MPI data in the ccall? Note that the use of Line 5 in a179cf8
|
This doesn't look good, right: julia> MPI.get_name(MPI.Datatype(Char))
"" ? I get this also with |
For the record, other broken tests with MVAPICH2 include:
Most OpenMPI failing tests seem to be related to |
The issue looks to be something with custom datatypes. Would be good to see what is going on |
For Open MPI:
It looks like the Melanox HCOLL library doesn't like custom MPI Datatypes. I think we've had issues with it before. My suggestions
|
The following should be a reproducer of your example: #include <mpi.h>
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
#include <stdlib.h>
int main(int argc, char** argv) {
// Initialize the MPI environment
MPI_Init(NULL, NULL);
int n, rank;
MPI_Comm_size(MPI_COMM_WORLD, &n);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
uint32_t sendarr[1];
sendarr[0] = rank;
uint32_t *recvbuf;
recvbuf = (uint32_t *)malloc(n*sizeof(uint32_t));
MPI_Datatype dup_type;
MPI_Type_dup(MPI_UINT32_T, &dup_type);
MPI_Allgather(sendarr, 1, dup_type, recvbuf, 1, dup_type, MPI_COMM_WORLD);
if (rank == 0) {
for (int i = 0; i < n; i++) {
printf("recvbuf[%i] = %"PRIu32"\n", i, recvbuf[i]);
}
}
MPI_Finalize();
return 0;
} |
Yes, that's probably it: [mosgiordano@fj003 openmpi]$ mpiexec -n 1 ./test
recvbuf[0] = 0
[mosgiordano@fj003 openmpi]$ mpiexec -n 2 ./test
[fj003:1748529:0:1748529] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x2000002021)
[fj003:1748528:0:1748528] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x2000002021)
==== backtrace (tid:1748528) ====
=================================
==== backtrace (tid:1748529) ====
=================================
[fj003:1748529] *** Process received signal ***
[fj003:1748529] Signal: Segmentation fault (11)
[fj003:1748529] Signal code: (-6)
[fj003:1748529] Failing at address: 0xa267ae8001aae31
[fj003:1748529] [ 0] linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0x4000000707a0]
[fj003:1748529] [ 1] /opt/mellanox/hcoll/lib/libhcoll.so.1(hcoll_create_mpi_type+0x8c4)[0x400003e70694]
[fj003:1748529] [ 2] /lustre/software/openmpi/gcc12.1.0/4.1.4/lib/openmpi/mca_coll_hcoll.so(+0x70a0)[0x400003d570a0]
[fj003:1748529] [ 3] /lustre/software/openmpi/gcc12.1.0/4.1.4/lib/openmpi/mca_coll_hcoll.so(mca_coll_hcoll_allgather+0x8c)[0x400003d5767c]
[fj003:1748529] [ 4] /lustre/software/openmpi/gcc12.1.0/4.1.4/lib/libmpi.so.40(MPI_Allgather+0x100)[0x4000000e6410]
[fj003:1748529] [ 5] ./test[0x400b78]
[fj003:1748529] [ 6] /lib64/libc.so.6(__libc_start_main+0xe4)[0x400000260de4]
[fj003:1748529] [ 7] ./test[0x400a0c]
[fj003:1748529] *** End of error message ***
[fj003:1748528] *** Process received signal ***
[fj003:1748528] Signal: Segmentation fault (11)
[fj003:1748528] Signal code: (-6)
[fj003:1748528] Failing at address: 0xa267ae8001aae30
[fj003:1748528] [ 0] linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0x4000000707a0]
[fj003:1748528] [ 1] /opt/mellanox/hcoll/lib/libhcoll.so.1(hcoll_create_mpi_type+0x8c4)[0x400003e70694]
[fj003:1748528] [ 2] /lustre/software/openmpi/gcc12.1.0/4.1.4/lib/openmpi/mca_coll_hcoll.so(+0x70a0)[0x400003d570a0]
[fj003:1748528] [ 3] /lustre/software/openmpi/gcc12.1.0/4.1.4/lib/openmpi/mca_coll_hcoll.so(mca_coll_hcoll_allgather+0x8c)[0x400003d5767c]
[fj003:1748528] [ 4] /lustre/software/openmpi/gcc12.1.0/4.1.4/lib/libmpi.so.40(MPI_Allgather+0x100)[0x4000000e6410]
[fj003:1748528] [ 5] ./test[0x400b78]
[fj003:1748528] [ 6] /lib64/libc.so.6(__libc_start_main+0xe4)[0x400000260de4]
[fj003:1748528] [ 7] ./test[0x400a0c]
[fj003:1748528] *** End of error message ***
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 1 with PID 1748529 on node fj003 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
[mosgiordano@fj003 openmpi]$ OMPI_MCA_coll_hcoll_enable="0" mpirun -np 2 ./test
recvbuf[0] = 0
recvbuf[1] = 1
[mosgiordano@fj003 openmpi]$ OMPI_MCA_coll_hcoll_enable="0" mpirun -np 4 ./test
recvbuf[0] = 0
recvbuf[1] = 1
recvbuf[2] = 2
recvbuf[3] = 3
[mosgiordano@fj003 openmpi]$ OMPI_MCA_coll_hcoll_enable="0" mpirun -np 8 ./test
recvbuf[0] = 0
recvbuf[1] = 1
recvbuf[2] = 2
recvbuf[3] = 3
recvbuf[4] = 4
recvbuf[5] = 5
recvbuf[6] = 6
recvbuf[7] = 7 Exporting `test_spawn` error
New entry for the known OpenMPI issues section? Was it reported upstream already? |
No: can you open an issue? They will probably want to know the version of HCOLL you're using. @vchuravy is there someone at Nvidia we should contact? |
Yes, but where, OpenMPI or HCOLL? Couldn't find where to report bugs in HCOLL. |
I would open it on Open MPI, and let them or @vchuravy contact the appropriate Melanox/Nvidia folks |
For the last error with OpenMPI, in |
We should probably have a generic mechanism for skipping tests |
From @giordano 's experiments in #693 (comment) it |
Ok, this is weird. Now on
even though a standalone program like #684 (comment) doesn't anymore 😕 However, as mentioned in #693 (comment), this program using MPI
MPI.Init(;threadlevel=:single)
MPI.Datatype(Char)
MPI.Finalize() also segfaults: $ srun -n 2 julia --project test.jl
[1113525] signal (11.1): Segmentation fault
in expression starting at /lustre/home/mosgiordano/tmp/mvapich/test.jl:4
[1113526] signal (11.1): Segmentation fault
in expression starting at /lustre/home/mosgiordano/tmp/mvapich/test.jl:4
MPIR_Call_attr_delete at /lustre/software/mvapich2/gcc11/2.3.6/lib/libmpi.so (unknown line)
MPIR_Attr_delete_list at /lustre/software/mvapich2/gcc11/2.3.6/lib/libmpi.so (unknown line)
Allocations: 2975 (Pool: 2963; Big: 12); GC: 0
MPIR_Call_attr_delete at /lustre/software/mvapich2/gcc11/2.3.6/lib/libmpi.so (unknown line)
MPIR_Attr_delete_list at /lustre/software/mvapich2/gcc11/2.3.6/lib/libmpi.so (unknown line)
Allocations: 2975 (Pool: 2963; Big: 12); GC: 0
srun: error: fj003: tasks 0-1: Segmentation fault (core dumped) But this C program #include <mpi.h>
#include <stdlib.h>
int main(int argc, char** argv) {
// Initialize the MPI environment
MPI_Init(NULL, NULL);
MPI_Datatype dup_type;
MPI_Type_dup(MPI_UINT32_T, &dup_type);
MPI_Finalize();
return 0;
} runs without problems. Does Edit: yes, |
If I inline using MPI
MPI.Init(; threadlevel=:single)
T = Char
# inline MPI.Datatype(T)
get!(MPI.created_datatypes, T) do
datatype = MPI.Datatype(MPI.API.MPI_DATATYPE_NULL[])
@assert MPI.Initialized()
MPI.Types.duplicate!(datatype, MPI.Datatype(UInt32))
end
MPI.Finalize() This is getting weirder and weirder.... 😢 Edit: ignore this message, it's wrong, see below. |
Scratch my previous message, I had inlined a bit too much, removing the actually offending lines. This is a better reproducer for the segfault: using MPI
MPI.Init(; threadlevel=:single)
datatype = MPI.Datatype(MPI.API.MPI_DATATYPE_NULL[])
MPI.API.MPI_Type_dup(MPI.Datatype(UInt32), datatype)
MPI.API.MPI_Type_commit(datatype)
MPI.API.MPI_Type_set_attr(datatype, MPI.JULIA_TYPE_PTR_ATTR[], pointer_from_objref(Char))
MPI.Finalize() This is in principle relatively easy to translate to C, except I don't know what |
I noticed The segfault occurred in |
Those are null pointers also in the headers of MVAPICH2 (see also #688 (comment)): $ echo '#include <mpi.h>' | mpicc -dM -E - | grep -E 'MPI_TYPE_NULL_(COPY|DELETE)_FN'
#define MPI_TYPE_NULL_DELETE_FN ((MPI_Type_delete_attr_function*)0)
#define MPI_TYPE_NULL_COPY_FN ((MPI_Type_copy_attr_function*)0) But yes, maybe we aren't registering callback correctly and those null pointers are called? Using debugger here is a bit complicated. |
@giordano I think I figured it out: the issue is that MVAPICH doesn't like it when objects are not cleaned up before using MPI
MPI.Init(; threadlevel=:single)
datatype = MPI.Datatype(MPI.API.MPI_DATATYPE_NULL[])
MPI.API.MPI_Type_dup(MPI.Datatype(UInt32), datatype)
MPI.API.MPI_Type_commit(datatype)
MPI.API.MPI_Type_set_attr(datatype, MPI.JULIA_TYPE_PTR_ATTR[], pointer_from_objref(Char))
# calling either of these will prevent a segfault:
# MPI.API.MPI_Type_delete_attr(datatype, MPI.JULIA_TYPE_PTR_ATTR[])
# MPI.free(datatype)
MPI.Finalize() |
The easiest fix for now is to probably |
This should be a C reproducer: #include <mpi.h>
#include <stdlib.h>
int main(int argc, char** argv) {
// Initialize the MPI environment
MPI_Init(NULL, NULL);
MPI_Datatype dup_type;
MPI_Type_dup(MPI_UINT32_T, &dup_type);
MPI_Type_commit(&dup_type);
int keyval;
MPI_Type_create_keyval(MPI_TYPE_NULL_COPY_FN,
MPI_TYPE_NULL_DELETE_FN,
&keyval, NULL);
MPI_Type_set_attr(dup_type, keyval, NULL);
MPI_Finalize();
return 0;
} |
That code does segfault, but the error message is different, it doesn't mention $ srun -n 2 ./test
[fj023:mpi_rank_0][error_sighandler] Caught error: Segmentation fault (signal 11)
[fj023:mpi_rank_1][error_sighandler] Caught error: Segmentation fault (signal 11)
srun: error: fj023: tasks 0-1: Segmentation fault (core dumped) With #696, #684 (comment) still segfaults, always same stacktrace 😢 |
#696 won't fix #684 (comment) (since we don't track that type), but it should fix #684 (comment) |
@giordano did you want to send a bug report to mvapich? It looks like you have to do it through the mailing list https://mvapich.cse.ohio-state.edu/help/ |
I emailed the list. Closing this for now, re-open if more issues arise. |
For the record, segmentation faults are gone when using MVAPICH after applying the patch --- a/src/mpi/attr/attrutil.c
+++ b/src/mpi/attr/attrutil.c
@@ -266,6 +266,7 @@
corresponding keyval */
/* Still to do: capture any error returns but continue to
process attributes */
+ if (p->keyval) {
mpi_errno = MPIR_Call_attr_delete( handle, p );
/* We must also remove the keyval reference. If the keyval
@@ -282,6 +283,7 @@
MPIU_Handle_obj_free( &MPID_Keyval_mem, p->keyval );
}
}
+ }
MPIU_Handle_obj_free( &MPID_Attr_mem, p );
suggested in the MVAPICH mailing list. |
With OpenMPI
With MVAPICH2
In MVAPICH2 there may be some threadings issues because loading the package issues the warning
Information about MPI on Ookami.
The text was updated successfully, but these errors were encountered: