-
Notifications
You must be signed in to change notification settings - Fork 918
Description
I've recently run into a problem with periodic geometry when I run a RANS problem on 16 cores or more (256+ MPI tasks). While initializing the Jacobian structure for the turbulence model, I run into one of two errors, depending on the core count.
The first error results in the following error message:
Fatal error in MPI_Sendrecv: Message truncated, error stack:
MPI_Sendrecv(249).................: MPI_Sendrecv(sbuf=0x2ee74f0, scount=10, MPI_DOUBLE, dest=19, stag=0, rbuf=0x2ee68e0, rcount=385, MPI_MPIDI_CH3U_Receive_data_found(144): Message from rank 25 and tag 0 truncated; 3200 bytes received but buffer size is 3080
aborting job
The second error just leads to the solver hanging indefinitely at the Initialize Jacobian structure (SA model) step. I'm guessing that an MPI send/receive is left dangling.
I have not seen these problems at lower core counts (2-4 cores with 2-32 MPI tasks).
The errors seem to be tied to the way the periodic send/receives are set up. If I change the periodic boundaries to far-field boundaries, the error vanishes.
I've also done a lot of work to weed out possible causes:
- I've generated the meshes using both
SU2_MSHand thesu2perioFortran tool. - I've run this on two different supercomputers, with different MPI builds.
- I've tested multiple different meshes with different resolutions.
- I've tried changing the RANS model and steady/unsteady options.
- I've even used a different solver (our hybrid solver) that's completely independent of the RANS solver classes. Same error.
- The problem occurs whether you're restarting or starting without a restart file.
I've got a minimal example that you can use to test this for yourself, in the attached files. It should be self-explanatory.