-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential segfault in MPIR_Attr_delete_list
#6364
Comments
The reproducer: #include <mpi.h>
#include <stdlib.h>
int main(int argc, char** argv) {
// Initialize the MPI environment
MPI_Init(NULL, NULL);
MPI_Datatype dup_type;
MPI_Type_dup(MPI_UINT32_T, &dup_type);
MPI_Type_commit(&dup_type);
int keyval;
MPI_Type_create_keyval(MPI_TYPE_NULL_COPY_FN,
MPI_TYPE_NULL_DELETE_FN,
&keyval, NULL);
MPI_Type_set_attr(dup_type, keyval, NULL);
MPI_Finalize();
return 0;
} FWIW, this does not segfault for me with MPICH 4.1rc2. I also configured with AddressSanitizer enabled and did not receive any warnings. |
For what is worth, I couldn't reproduce with MVAPICH on x86_64 either, the only system where I bumped into the segfault so far is Ookami. |
Got it. I'll give this a try on one of our A64FX nodes and see how it goes. Any special compiler or configuration options? |
Nothing too special, this is the configuration used for MVAPICH:
Compiled with GCC 12.1 |
After looking at the code some, I have a hard time seeing how |
I think MVAPICH's patch is due to glitches (or bugs) somewhere else that somehow left invalid entries in the attribute list. We should try to locate the source of the issue (if we are confirmed to have the same issue) and fix the source. The unnecessary defensive code, while does not hurt for the execution, adds confusion for the code maintenance. |
Close this issue due to non-reproducible. Re-open if it can be reproduced with recent MPICH. |
In JuliaParallel/MPI.jl#684 we experienced some segmentation faults using MVAPICH 2 on Ookami. This was reported, with a reproducer, in the MVAPICH mailing list and a patch
is available to fix the segmentation fault, which I can confirm is working for me.
I don't have an MPICH build so I couldn't check whether I could reproduce the same segmentation fault with MPICH, but the code of
MPIR_Attr_delete_list
mpich/src/mpi/attr/attrutil.c
Lines 219 to 278 in 2bc4258
p->keyval
, so you may want to apply a similar patch.CC: @simonbyrne.
The text was updated successfully, but these errors were encountered: