-
Notifications
You must be signed in to change notification settings - Fork 879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault when opening a file #3695
Comments
thanks for the bug report, I will look into this.
Thanks
Edgar
…On 6/12/2017 5:54 PM, Wei-keng Liao wrote:
I am encountering a Segmentation fault when using a communicator
created from MPI_Cart_create in 3D, using OpenMPI version 2.1.0 and
Intel C compiler 17.0.0 on a Linux Ubuntu machine.
The gdb trace points to a possible cause at line 914 of file
io_ompio_file_open.c
|int coords_tmp[2] = { 0 }; |
The size of coords_tmp is too small for 3D coordinate communicators
while ompio_fh->f_comm->c_topo->mtc.cart->ndims is 3.
Below is the gdb trace and the test program.
|Program terminated with signal SIGSEGV, Segmentation fault. #0
0x00007f2e18f87b4c in mca_io_ompio_cart_based_grouping (ompio_fh=0x0)
at io_ompio_file_open.c:968 968 if
((coords_tmp[1]/ompio_fh->f_init_procs_per_group) == (gdb) where #0
0x00007f2e18f87b4c in mca_io_ompio_cart_based_grouping (ompio_fh=0x0)
at io_ompio_file_open.c:968 #1 0x00007f2e18f85da8 in
ompio_io_ompio_file_open (comm=0x2051850, filename=0x20584b0
"testfile", amode=9, info=0x601540 <ompi_mpi_info_null>,
ompio_fh=0x20588d0, use_sharedfp=1 '\001') at io_ompio_file_open.c:204
#2 0x00007f2e18f8585b in mca_io_ompio_file_open (comm=0x2051850,
filename=0x20584b0 "testfile", amode=9, info=0x601540
<ompi_mpi_info_null>, fh=0x20584d0) at io_ompio_file_open.c:62 #3
0x00007f2e26a9fd88 in mca_io_base_file_select (file=0x20584d0,
preferred=0x0) at base/io_base_file_select.c:457 #4 0x00007f2e2696a40e
in ompi_file_open (comm=0x2051850, filename=0x400f54 "testfile",
amode=9, info=0x601540 <ompi_mpi_info_null>, fh=0x7ffcffb0abc0) at
file/file.c:132 #5 0x00007f2e26a54ffe in PMPI_File_open
(comm=0x2051850, filename=0x400f54 "testfile", amode=9, info=0x601540
<ompi_mpi_info_null>, fh=0x7ffcffb0abc0) at pfile_open.c:92 #6
0x0000000000400a58 in main (argc=1, argv=0x7ffcffb0acd8) at
cart_bug.c:18 |
|#include <stdlib.h> #include <stdio.h> #include <mpi.h> int main(int
argc, char **argv) { int nprocs, cart_nprocs, dims[3]={1,1,0},
periods[3]={0,0,0}; MPI_Comm comm_cart; MPI_File fh;
MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
MPI_Dims_create(nprocs, 3, dims); MPI_Cart_create(MPI_COMM_WORLD, 3,
dims, periods, 0, &comm_cart); MPI_File_open(comm_cart, "testfile",
MPI_MODE_CREATE | MPI_MODE_RDWR, MPI_INFO_NULL, &fh); MPI_Finalize();
return 0; } |
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#3695>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AH22HpmoD-BHiTpC-VNY8-4np9e-Chc4ks5sDcGZgaJpZM4N3yQH>.
|
according to gdb, the issue is that |
Without having looked into this, I suspect that Wei-keng is correct in his analysis. All of our test cases for the cart-grouping are based on 2-D cartesian topologies, I need to add a 3-D test case (or prevent ompio entering this code section if the current code can not easily be extended to 3-D an higher dimensional topologies). |
My digging is when line 966 calls mca_topo_base_cart_coords() with cart_topo.ndims being 3, but coords_tmp[] has only two elements, the function mca_topo_base_cart_coords is assigning a value to coords_tmp[2] which is out of bound and may cause ompio_fh to become NULL.
In file ompi/mca/topo/base/topo_base_cart_coords.c, line 56 is accessing coords_tmp[2].
|
I have a fix pending on this issue, and I will file PRs for the 2.1.x and 3.0.x for the first part of that. The longer story: the cartesian grouping based algorithm has unfortunately been left out in the rewrite of the aggregator selection algorithms two years (or so) back . It is called at the wrong place (file_open instead of file_set_view), and it can not deal with any other cart topology than 2-D. The fix consists of two parts:
|
the cart_based_grouping aggregator strategy was not correctly updated during the last major rewrite of the aggregator selection algorithm. It is also not supposed to be called from file_open (but from file_set_view). This fixes an issue reported on the mailing list bei @wkliao issue open-mpi#3695 Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
the cart_based_grouping aggregator strategy was not correctly updated during the last major rewrite of the aggregator selection algorithm. It is also not supposed to be called from file_open (but from file_set_view). This fixes an issue reported on the mailing list bei @wkliao issue open-mpi#3695 Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
the cart_based_grouping aggregator strategy was not correctly updated during the last major rewrite of the aggregator selection algorithm. It is also not supposed to be called from file_open (but from file_set_view). This fixes an issue reported on the mailing list bei @wkliao issue open-mpi#3695 Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
this issue has been fixed. |
…(both hdf5 and pnetcdf), there is a bug in OpenMPI version 2.1 (used in Ubuntu 17.10 at least) that is reported here open-mpi/ompi#3695; let's try another version of openmpi
I am encountering a Segmentation fault when using a communicator created from MPI_Cart_create in 3D, using OpenMPI version 2.1.0 and Intel C compiler 17.0.0 on a Linux Ubuntu machine.
The gdb trace points to a possible cause at line 914 of file io_ompio_file_open.c
The size of coords_tmp is too small for 3D coordinate communicators while ompio_fh->f_comm->c_topo->mtc.cart->ndims is 3.
Below is the gdb trace and the test program.
The text was updated successfully, but these errors were encountered: