-
Notifications
You must be signed in to change notification settings - Fork 919
Hybrid parallel (OpenMP) support for incompressible solvers #1178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| StrainMag_Max = max(StrainMag_Max, strainMax); | ||
| Omega_Max = max(Omega_Max, omegaMax); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose you do the reduction over the threads yourself because of efficiency, when the if statement is false, but wouldn't it be more readable to have a reduction clause in the OMP for loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for having a look at the code Edwin, the issues with these kind of reductions were:
- The microsoft compilers do not support max/min reductions;
- For the code to compile with AD (forward) we would need to define custom reduction operators, which might not be very readable;
- Earlier discussions about OpenMP+AD (reverse) suggested that avoiding reduction clauses would make it easier to develop a thread-safe AD tool.
For performance the reduction clause would be more efficient, as the compiler can make min/max atomic, however I do not know of a nice simple way to implement atomic min/max.
| SU2_OMP_BARRIER | ||
| SU2_OMP_MASTER | ||
| { | ||
| SU2_OMP_MASTER { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For efficiency reasons then the Allreduce could be done in a single call with two arguments, as for both Omega and the Strain you compute the maximum.
| class CIncEulerSolver : public CFVMFlowSolverBase<CIncEulerVariable, INCOMPRESSIBLE> { | ||
| protected: | ||
| CFluidModel *FluidModel = nullptr; /*!< \brief fluid model used in the solver */ | ||
| vector<CFluidModel*> FluidModel; /*!< \brief fluid model used in the solver. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you want to put the fluid model in the base class? CEulerSolver has exactly the same variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit different for the NEMO solver, it also has a "FluidModel" but of a different type.
vdweide
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The few questions I had have been answered, so this one looks good to me.
TobiKattmann
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for adding omp support to the inc solver 💐 and moving some reg test accordingly
And thanks for moving some functions to the FVMFlowSolverBase 👍
Much appreciated
| */ | ||
| class CEulerSolver : public CFVMFlowSolverBase<CEulerVariable, COMPRESSIBLE> { | ||
| protected: | ||
| using BaseClass = CFVMFlowSolverBase<CEulerVariable, COMPRESSIBLE>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need this to call an implementation of the base class later like BaseClass::SetInitialCondition in this very CEulerSolver. But SetInitialCondition is also public in the FVMBase class so it should be callable anyway because we inherit that via class CEulerSolver : public CFVMFlowSolverBase ... where am I wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you call SetInitialCondition in SetInitialCondition you get an infinite loop, I just wanted to make more expressive I was calling the implementation of the parent first.
| inline virtual void Viscous_Residual(unsigned long iEdge, CGeometry *geometry, CSolver **solver_container, | ||
| CNumerics *numerics, CConfig *config) { } | ||
| void Viscous_Residual_impl(unsigned long iEdge, CGeometry *geometry, CSolver **solver_container, | ||
| CNumerics *numerics, CConfig *config); | ||
| using CSolver::Viscous_Residual; /*--- Silence warning ---*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea behind this to be able to 'fake' override Viscous_Residual for viscous flows with just calling Viscous_Residual_impl because for inviscid flow the empty implementation from here is taken. This trick has to be used because FVMBase -> Euler -> NS is the inheritance and Viscous_Residual itself cannot be overloaded because Euler is the middleman?
And what happens to the call in CIntegration to Viscous_Residual for NS flow. Is there a doubled contribution then because it is called there and in the centered/upwind residual?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, in Cintegration the empty impl of CSolver is used and in the centered/upwind residual the correct Viscous_Residual impl with the Viscous_Residual_impl init ...
Seems that i have to go back and read a couple of pages in 'Inheritance for complete morons'. fml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep that's all correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except the self flagellation part, no need for that xD
| * \note The convective residual methods include a call to this for each edge, | ||
| * this allows convective and viscous loops to be "fused". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a speed-up due to this? Or is this just for simplification?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fast code goes brrrrrrrrrrrrrrrrrrrr
| template <class V, ENUM_REGIME R> | ||
| void CFVMFlowSolverBase<V, R>::LoadRestart(CGeometry **geometry, CSolver ***solver, | ||
| CConfig *config, int iter, bool update_geo) { | ||
| LoadRestart_impl(geometry, solver, config, iter, update_geo); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why having the LoadRestart_impl is beneficial? The LoadRestart moved from the Euler and IncEuler to FVMFlowSolverBase so that is good but why the additional wrapper function
| * \return Value of the pressure at the infinity. | ||
| */ | ||
| inline CFluidModel* GetFluidModel(void) const final { return FluidModel;} | ||
| inline CFluidModel* GetFluidModel(void) const final { return FluidModel[omp_get_thread_num()]; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, I guess (real) shared-memory is not viable for the FluidModel as it is filled and computes stuff on the fly so each thread gets its own?
| } | ||
| } | ||
| } | ||
| BaseClass::SetInitialCondition(geometry, solver_container, config, TimeIter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the BaseClass call I mean above
| CConfig *config, unsigned short iMesh, unsigned short iRKStep) { | ||
|
|
||
| CNumerics* numerics = numerics_container[CONV_TERM]; | ||
| CNumerics* numerics = numerics_container[CONV_TERM + omp_get_thread_num()*MAX_TERMS]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you need multiple numerics instances because in the OMP loop they all individually fill the container and evaluate their residual, right?
| void CIncEulerSolver::Centered_Residual(CGeometry *geometry, CSolver **solver_container, CNumerics **numerics_container, | ||
| CConfig *config, unsigned short iMesh, unsigned short iRKStep) { | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it would be prob nice to reflect in the name that it contains the viscous residual.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... but then I would need to change all the solvers, the vectorized residual loop is called EdgeFluxResidual to be more inclusive...
|
|
||
| /*--- Read the restart data from either an ASCII or binary SU2 file. ---*/ | ||
|
|
||
| if (config->GetRead_Binary_Restart()) { | ||
| Read_SU2_Restart_Binary(geometry[MESH_0], config, restart_filename); | ||
| } else { | ||
| Read_SU2_Restart_ASCII(geometry[MESH_0], config, restart_filename); | ||
| } | ||
|
|
||
| /*--- Load data from the restart into correct containers. ---*/ | ||
|
|
||
| counter = 0; | ||
| for (iPoint_Global = 0; iPoint_Global < geometry[MESH_0]->GetGlobal_nPointDomain(); iPoint_Global++ ) { | ||
|
|
||
| /*--- Retrieve local index. If this node from the restart file lives | ||
| on the current processor, we will load and instantiate the vars. ---*/ | ||
|
|
||
| iPoint_Local = geometry[MESH_0]->GetGlobal_to_Local_Point(iPoint_Global); | ||
|
|
||
| if (iPoint_Local > -1) { | ||
|
|
||
| /*--- We need to store this point's data, so jump to the correct | ||
| offset in the buffer of data from the restart file and load it. ---*/ | ||
|
|
||
| index = counter*Restart_Vars[1] + skipVars; | ||
| for (iVar = 0; iVar < nVar_Restart; iVar++) Solution[iVar] = Restart_Data[index+iVar]; | ||
| nodes->SetSolution(iPoint_Local,Solution); | ||
| iPoint_Global_Local++; | ||
|
|
||
| /*--- For dynamic meshes, read in and store the | ||
| grid coordinates and grid velocities for each node. ---*/ | ||
|
|
||
| if (dynamic_grid && val_update_geo) { | ||
|
|
||
| /*--- Read in the next 2 or 3 variables which are the grid velocities ---*/ | ||
| /*--- If we are restarting the solution from a previously computed static calculation (no grid movement) ---*/ | ||
| /*--- the grid velocities are set to 0. This is useful for FSI computations ---*/ | ||
|
|
||
| /*--- Rewind the index to retrieve the Coords. ---*/ | ||
| index = counter*Restart_Vars[1]; | ||
| for (iDim = 0; iDim < nDim; iDim++) { Coord[iDim] = Restart_Data[index+iDim]; } | ||
|
|
||
| su2double GridVel[MAXNDIM] = {0.0}; | ||
| if (!steady_restart) { | ||
| /*--- Move the index forward to get the grid velocities. ---*/ | ||
| index = counter*Restart_Vars[1] + skipVars + nVar_Restart + turbVars; | ||
| for (iDim = 0; iDim < nDim; iDim++) { GridVel[iDim] = Restart_Data[index+iDim]; } | ||
| } | ||
|
|
||
| for (iDim = 0; iDim < nDim; iDim++) { | ||
| geometry[MESH_0]->nodes->SetCoord(iPoint_Local, iDim, Coord[iDim]); | ||
| geometry[MESH_0]->nodes->SetGridVel(iPoint_Local, iDim, GridVel[iDim]); | ||
| } | ||
| } | ||
|
|
||
| /*--- For static FSI problems, grid_movement is 0 but we need to read in and store the | ||
| grid coordinates for each node (but not the grid velocities, as there are none). ---*/ | ||
|
|
||
| if (static_fsi && val_update_geo) { | ||
| /*--- Rewind the index to retrieve the Coords. ---*/ | ||
| index = counter*Restart_Vars[1]; | ||
| for (iDim = 0; iDim < nDim; iDim++) { Coord[iDim] = Restart_Data[index+iDim];} | ||
|
|
||
| for (iDim = 0; iDim < nDim; iDim++) { | ||
| geometry[MESH_0]->nodes->SetCoord(iPoint_Local, iDim, Coord[iDim]); | ||
| } | ||
| } | ||
|
|
||
| /*--- Increment the overall counter for how many points have been loaded. ---*/ | ||
| counter++; | ||
|
|
||
| } | ||
| } | ||
|
|
||
| /*--- Detect a wrong solution file ---*/ | ||
|
|
||
| if (iPoint_Global_Local < nPointDomain) { | ||
| SU2_MPI::Error(string("The solution file ") + restart_filename + string(" doesn't match with the mesh file!\n") + | ||
| string("It could be empty lines at the end of the file."), CURRENT_FUNCTION); | ||
| } | ||
|
|
||
| /*--- Update the geometry for flows on deforming meshes ---*/ | ||
|
|
||
| if ((dynamic_grid || static_fsi) && val_update_geo) { | ||
|
|
||
| /*--- Communicate the new coordinates and grid velocities at the halos ---*/ | ||
|
|
||
| geometry[MESH_0]->InitiateComms(geometry[MESH_0], config, COORDINATES); | ||
| geometry[MESH_0]->CompleteComms(geometry[MESH_0], config, COORDINATES); | ||
|
|
||
| if (dynamic_grid) { | ||
| geometry[MESH_0]->InitiateComms(geometry[MESH_0], config, GRID_VELOCITY); | ||
| geometry[MESH_0]->CompleteComms(geometry[MESH_0], config, GRID_VELOCITY); | ||
| } | ||
|
|
||
| /*--- Recompute the edges and dual mesh control volumes in the | ||
| domain and on the boundaries. ---*/ | ||
|
|
||
| geometry[MESH_0]->SetControlVolume(config, UPDATE); | ||
| geometry[MESH_0]->SetBoundControlVolume(config, UPDATE); | ||
| geometry[MESH_0]->SetMaxLength(config); | ||
|
|
||
| /*--- Update the multigrid structure after setting up the finest grid, | ||
| including computing the grid velocities on the coarser levels. ---*/ | ||
|
|
||
| for (iMesh = 1; iMesh <= config->GetnMGLevels(); iMesh++) { | ||
| iMeshFine = iMesh-1; | ||
| geometry[iMesh]->SetControlVolume(config, geometry[iMeshFine], UPDATE); | ||
| geometry[iMesh]->SetBoundControlVolume(config, geometry[iMeshFine],UPDATE); | ||
| geometry[iMesh]->SetCoord(geometry[iMeshFine]); | ||
| if (dynamic_grid) { | ||
| geometry[iMesh]->SetRestricted_GridVelocity(geometry[iMeshFine], config); | ||
| } | ||
| geometry[iMesh]->SetMaxLength(config); | ||
| } | ||
| } | ||
|
|
||
| /*--- Communicate the loaded solution on the fine grid before we transfer | ||
| it down to the coarse levels. We alo call the preprocessing routine | ||
| on the fine level in order to have all necessary quantities updated, | ||
| especially if this is a turbulent simulation (eddy viscosity). ---*/ | ||
|
|
||
| solver[MESH_0][FLOW_SOL]->InitiateComms(geometry[MESH_0], config, SOLUTION); | ||
| solver[MESH_0][FLOW_SOL]->CompleteComms(geometry[MESH_0], config, SOLUTION); | ||
|
|
||
| /*--- For turbulent simulations the flow preprocessing is done by the turbulence solver | ||
| * after it loads its variables (they are needed to compute flow primitives). ---*/ | ||
| if (!turbulent) { | ||
| solver[MESH_0][FLOW_SOL]->Preprocessing(geometry[MESH_0], solver[MESH_0], config, MESH_0, NO_RK_ITER, RUNTIME_FLOW_SYS, false); | ||
| } | ||
|
|
||
| /*--- Interpolate the solution down to the coarse multigrid levels ---*/ | ||
|
|
||
| for (iMesh = 1; iMesh <= config->GetnMGLevels(); iMesh++) { | ||
| for (iPoint = 0; iPoint < geometry[iMesh]->GetnPoint(); iPoint++) { | ||
| Area_Parent = geometry[iMesh]->nodes->GetVolume(iPoint); | ||
| for (iVar = 0; iVar < nVar; iVar++) Solution[iVar] = 0.0; | ||
| for (iChildren = 0; iChildren < geometry[iMesh]->nodes->GetnChildren_CV(iPoint); iChildren++) { | ||
| Point_Fine = geometry[iMesh]->nodes->GetChildren_CV(iPoint, iChildren); | ||
| Area_Children = geometry[iMesh-1]->nodes->GetVolume(Point_Fine); | ||
| Solution_Fine = solver[iMesh-1][FLOW_SOL]->GetNodes()->GetSolution(Point_Fine); | ||
| for (iVar = 0; iVar < nVar; iVar++) { | ||
| Solution[iVar] += Solution_Fine[iVar]*Area_Children/Area_Parent; | ||
| } | ||
| } | ||
| solver[iMesh][FLOW_SOL]->GetNodes()->SetSolution(iPoint,Solution); | ||
| } | ||
| solver[iMesh][FLOW_SOL]->InitiateComms(geometry[iMesh], config, SOLUTION); | ||
| solver[iMesh][FLOW_SOL]->CompleteComms(geometry[iMesh], config, SOLUTION); | ||
|
|
||
| if (!turbulent) { | ||
| solver[iMesh][FLOW_SOL]->Preprocessing(geometry[iMesh], solver[iMesh], config, iMesh, NO_RK_ITER, RUNTIME_FLOW_SYS, false); | ||
| } | ||
| } | ||
|
|
||
| /*--- Update the old geometry (coordinates n and n-1) in dual time-stepping strategy ---*/ | ||
| if (dual_time && config->GetGrid_Movement() && !config->GetDeform_Mesh() && | ||
| (config->GetKind_GridMovement() != RIGID_MOTION)) { | ||
| Restart_OldGeometry(geometry[MESH_0], config); | ||
| } | ||
|
|
||
| /*--- Delete the class memory that is used to load the restart. ---*/ | ||
|
|
||
| delete [] Restart_Vars; Restart_Vars = nullptr; | ||
| delete [] Restart_Data; Restart_Data = nullptr; | ||
| LoadRestart_impl(geometry, solver, config, val_iter, val_update_geo, Solution, nVar_Restart); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TobiKattmann to satisfy the quirkiness of the incompressible solver that needs a default value for temperature when there is no energy equation
Proposed Changes
Remove some more duplication, fix some issues, and make the incompressible solvers compatible with OpenMP.
Related Work
Fix #1175
PR Checklist