Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

richards_driver crashing for parallel run #93

Closed
bishtgautam opened this issue Oct 7, 2020 · 5 comments · Fixed by #94
Closed

richards_driver crashing for parallel run #93

bishtgautam opened this issue Oct 7, 2020 · 5 comments · Fixed by #94
Assignees
Labels
bug Something isn't working MPFAO

Comments

@bishtgautam
Copy link
Member

bishtgautam commented Oct 7, 2020

@jeff-cohere reported the model failure here

$ mpirun -np 2 richards_driver -dim 3 -Nx 100 -Ny 100 -Nz 10 -tdy_timers -final_time 30
No protocol specified
Beginning Richards Driver simulation.
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR:
[0]PETSC ERROR: DetermineCellsAboveAndBelow: No. of cells above (=2) and below (=1) of the vertex_id 56780 are not same. Such a mesh is unsupported.

[0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[0]PETSC ERROR: Petsc Development GIT revision: v3.12.4-1083-g1a6d72e33c  GIT Date: 2020-03-26 13:14:23 -0500
[0]PETSC ERROR: richards_driver on a debug named crunchy by jeff Wed Oct  7 09:16:40 2020
[0]PETSC ERROR: Configure options --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --CFLAGS="-g -O0" --CXXFLAGS="-g -O0" --FFLAGS="-g -O0 -Wno-unused-function" --with-clanguage=c --with-debugging=1 --with-shared-libraries=0 --download-hdf5 --download-metis --download-parmetis --download-exodusii --download-netcdf --download-pnetcdf --download-zlib --download-fblaslapack
[0]PETSC ERROR: #1 DetermineCellsAboveAndBelow() line 1783 in /home/jeff/projects/pnnl/TDycore/src/mesh/tdycoremesh.c
[0]PETSC ERROR: #2 FindCellsAboveAndBelowAVertex() line 2523 in /home/jeff/projects/pnnl/TDycore/src/mesh/tdycoremesh.c
[0]PETSC ERROR: #3 FindCellsAboveAndBelowVertices() line 2615 in /home/jeff/projects/pnnl/TDycore/src/mesh/tdycoremesh.c
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[0]PETSC ERROR: likely location of problem given in stack below
[0]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
[0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[0]PETSC ERROR:       INSTEAD the line number of the start of the function
[0]PETSC ERROR:       is given.
[0]PETSC ERROR: [0] ComputeTransmissibilityMatrix_ForNonCornerVertex line 425 /home/jeff/projects/pnnl/TDycore/src/mpfao/3D/tdympfao3D_core.c
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Signal received
[0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[0]PETSC ERROR: Petsc Development GIT revision: v3.12.4-1083-g1a6d72e33c  GIT Date: 2020-03-26 13:14:23 -0500
[0]PETSC ERROR: richards_driver on a debug named crunchy by jeff Wed Oct  7 09:16:40 2020
[0]PETSC ERROR: Configure options --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --CFLAGS="-g -O0" --CXXFLAGS="-g -O0" --FFLAGS="-g -O0 -Wno-unused-function" --with-clanguage=c --with-debugging=1 --with-shared-libraries=0 --download-hdf5 --download-metis --download-parmetis --download-exodusii --download-netcdf --download-pnetcdf --download-zlib --download-fblaslapack
[0]PETSC ERROR: #4 User provided function() line 0 in  unknown file
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 50152059.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[1]PETSC ERROR: ------------------------------------------------------------------------
[1]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch system) has told this process to end
[1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[1]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[1]PETSC ERROR: likely location of problem given in stack below
[1]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
[1]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[1]PETSC ERROR:       INSTEAD the line number of the start of the function
[1]PETSC ERROR:       is given.
[1]PETSC ERROR: [1] MatAssemblyEnd_SeqAIJ line 1049 /home/jeff/projects/pnnl/petsc/src/mat/impls/aij/seq/aij.c
[1]PETSC ERROR: [1] MatAssemblyEnd line 5335 /home/jeff/projects/pnnl/petsc/src/mat/interface/matrix.c
[1]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[1]PETSC ERROR: Signal received
[1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[1]PETSC ERROR: Petsc Development GIT revision: v3.12.4-1083-g1a6d72e33c  GIT Date: 2020-03-26 13:14:23 -0500
[1]PETSC ERROR: richards_driver on a debug named crunchy by jeff Wed Oct  7 09:16:40 2020
[1]PETSC ERROR: Configure options --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --CFLAGS="-g -O0" --CXXFLAGS="-g -O0" --FFLAGS="-g -O0 -Wno-unused-function" --with-clanguage=c --with-debugging=1 --with-shared-libraries=0 --download-hdf5 --download-metis --download-parmetis --download-exodusii --download-netcdf --download-pnetcdf --download-zlib --download-fblaslapack
[1]PETSC ERROR: #1 User provided function() line 0 in  unknown file
[crunchy:30100] 1 more process has sent help message help-mpi-api.txt / mpi-abort
[crunchy:30100] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

@bishtgautam bishtgautam self-assigned this Oct 7, 2020
@bishtgautam bishtgautam added MPFAO bug Something isn't working labels Oct 7, 2020
@bishtgautam
Copy link
Member Author

  • mpiexec-openmpi-gcc8 -n 2 ./richards_driver -dim 3 -Nx 100 -Ny 100 -Nz 10 -tdy_water_density exponential -final_time 1.e0: Crashed
  • mpiexec-openmpi-gcc8 -n 4 ./richards_driver -dim 3 -Nx 100 -Ny 100 -Nz 10 -tdy_water_density exponential -final_time 1.e0: Worked fine

@bishtgautam
Copy link
Member Author

bishtgautam commented Oct 8, 2020

The error can be reproduced on a smaller domain via:

nx=2;ny=5;nz=10; mpiexec -n 2 \
./richards_driver -dim 3 -Nx $nx -Ny $ny -Nz $nz -tdy_water_density exponential -final_time 1.e0

@bishtgautam
Copy link
Member Author

The error is because DMPlex is using a star stencil, but we need to use a box stencil.

@knepley How can I tell DMPlex to use box stencil instead of star stencil? Here is the code that I'm using to set 1 DOF at cell centers:

  ierr = PetscSectionCreate(comm, &sec); CHKERRQ(ierr);
  ierr = PetscSectionSetNumFields(sec, 1); CHKERRQ(ierr);
  ierr = PetscSectionSetFieldName(sec, 0, "LiquidPressure"); CHKERRQ(ierr);
  ierr = PetscSectionSetFieldComponents(sec, 0, 1); CHKERRQ(ierr);

  ierr = DMPlexGetHeightStratum(dm,0,&pStart,&pEnd); CHKERRQ(ierr);
  ierr = PetscSectionSetChart(sec,pStart,pEnd); CHKERRQ(ierr);
  for(p=pStart; p<pEnd; p++) {
    ierr = PetscSectionSetFieldDof(sec,p,0,1); CHKERRQ(ierr);
    ierr = PetscSectionSetDof(sec,p,1); CHKERRQ(ierr);
  }
  ierr = PetscSectionSetUp(sec); CHKERRQ(ierr);
  ierr = DMSetSection(dm,sec); CHKERRQ(ierr);
  ierr = PetscSectionViewFromOptions(sec, NULL, "-layout_view"); CHKERRQ(ierr);
  ierr = PetscSectionDestroy(&sec); CHKERRQ(ierr);
  ierr = DMSetBasicAdjacency(dm,PETSC_TRUE,PETSC_TRUE); CHKERRQ(ierr);

@knepley
Copy link

knepley commented Oct 8, 2020 via email

@bishtgautam
Copy link
Member Author

Thanks @knepley. You were correct that the existing stencil was a box stencil. The error was because the code wasn't skipping non-local vertices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working MPFAO
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants