Skip to content

Failure to Restart on Large Meshes with Time-Averaged Data #949

@GomerOfDoom

Description

@GomerOfDoom

sd2_case1b_ddes_v7.cfg.txt

Hello,

We are currently unable to restart SU2 in DDES mode from a restart file that includes time-averaged data on very large (~180 million cell) meshes.

Compiled in release mode, the code gives the error "FGMRES orthogonalization failed, linear solver diverged."

Compiled in debug mode, the code issues an assertion failure at line 1881 of $SU2_HOME/SU2_CFD/src/numerics_structure.cpp, which is a check in the CNumerics::SetRoe_Dissipation(...) method to make sure that variable 'Dissipation_j' is between zero and one.

This problem only appears when attempting to restart from solution files that include "TIME_AVERAGE" data on very large meshes.

Note the above behavior is occurring with commit 382e82f of the "develop" branch.

I have pulled the latest commits of develop (c093a62) and master (d9c867d), but get segfaults during Jacobian structure initialization when attempting to restart on multiple cores.

All help is appreciated.

-Paul

To Reproduce
Config file attached, but mesh file is quite large... 17.6 GB.

Desktop (please complete the following information):

  • Department of Defense Unclassified System: "Onyx"
  • System type: Cray XC40/50
  • OS: Variant of SuSe Linux 12.3 and/or Cray Linux Environment
  • Compiler: Intel 19.0.1.144
  • MPI: cray-mpich 7.6.3
  • SU2 v. 7.0.1, develop branch, commit 382e82f (and segfaults with latest commits of develop (c093a62) and master (d9c867d) ).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions