-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Nalu timers #18
Fix Nalu timers #18
Conversation
- Fix timerMisc for LowMach and Momentum EQS that were counting certain execution regions twice. timerMisc_ was accumulating time when computing projected_nodal_gradient which in turn was either incrementing timerMisc_ or accumulating time in PNGEQS. Current fix is to keep calls to compute_projected_nodal_gradient out of timed blocks. - Low Mach timerMisc_ accumulates time with project_nodal_velocity which calls ContinuityEQS projected_nodal_gradient where there is additional time accumulation in continuityEQS->timerMisc_. - "No output time" was reporting CPU Time instead of WallClock time. This commit changes that to WallClock time so that it is consistent with what is being reported for STKPERF: Total time - "Timing for connectivity" summary was removed because this is already reported as "init" for each equation system With these changes the sum of all the timers in the timing table should be less than the total time reported for "main()" as well as "STKPERF: Total Time" in the summary.
This was previously discussed here: NaluCFD/Nalu#189 and was pointed out by the solvers team during the ECP FY18-Q3 milestone work. I noticed that the sum of individual timers did not add up to the total time even on non-OpenMP runs, and found several places where the timers were double counted. Cc: @sthomas61 @aprokop @jhux2 |
This is great! But is there a way if that's all of them? Or is the possible that some problems still exist? How does one check (now, or in the future)? |
@aprokop I assume your question is "is there a way to be sure if this commit fixes all the timer issues?" Short of a thorough code audit by several people, I don't know of a good way to prove that there are no more timer issues. What we can be fairly certain after this commit, the discussion/fixes in issues NaluCFD/Nalu#238 and NaluCFD/Nalu#189:
The place where a second look would be useful is the We would definitely know that there are more issues if the sum of all timers in the reported table adds up to more than |
I would also recommend that we add another timer that tracks the total time spent in each EQS and use that as a metric instead of summing up individual pieces. That will quickly show us any issues if there is still double counting in any of the Further, we can also check that the sum of Continuity and Momentum total time is less than or equal to timing reported for LowMachEQS and so on. |
I think there may still be a problem somewhere. If I go back to my June runs, I had for example the following reporte max timers (different columns for different MPI/thread combinations):
So the |
@aprokop In Nalu is just using stk::wall_time(). So there is a high probability that the max time does occur on different ranks. Further, the max for Furthermore, The situation might be different if we are using Techos::Time because that appears to use MPI_Wtime which might be synchronizing across all procs for MPICH. Although the OpenMPI docs for MPI_Wtime says that the times are local. |
@aprokop Can you explain the columns in your timers?
For the second column, the difference is quite large. So I agree with you that there is still something that is unexplained. |
@aprokop Here are the locations in Nalu source code where the time is accumulated for the equation systems. May be you'll notice something that I overlooked.
Looking at the source code, you'll notice that As far as I can tell, there is no barrier within
|
Well, any
The first column is 32 MPI ranks with a single thread per ranks, so 2048 MPI with a single thread. The second column, same V27 41 R1 problem, but in 16x2, so 1048 MPI ranks with 2 threads each. And so on, column 3 is 8x4, ... It is certainly an interesting observation that the timings seem off when running with multiple threads. No clue of why that would be, though.
Yeah, I saw that.
Certainly, this could be an interesting experiment. Could also try replacing |
Ran on Cori in 4x8 configuration with this PR and
Hmm, now the sum of There is trove of interesting info here. First, |
@aprokop The timer info for The MomentumEQS is missing init line, can you add that back in please? |
I am attaching a schematic of what I think is going on here with the So why does this situation get worse with multiple threads? Could it be that based on scheduling and load imbalance, the thread synchronizations imposed by |
Done. It's interesting that it was there, but not shown for some reason. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sayerhs I think these changes look good. It is hard to in-tangle the times but this looks like a step in the right direction. I believe the assemble phase is bracketed by synchronizations. finalizeLinearSystem() happens before assembly starts, and it calls graph->fillComplete() and other methods which include synchronization. load_complete() is called at the end of assembly, and I'm pretty sure that is also a synchronization.
They now both use the new MultiDimViews class. Next step will be to make them both have the same template parameters.
uninitialized bool was causing initialization to be skipped With this commit, all tests pass: Test project /ascldap/users/jhu/exawind/build-naluwind-gcc-7.2.0-openmpi-1.10.1-opt Start 75: oversetHybrid Start 33: heatedBackStep 1/77 Test Exawind#33: heatedBackStep ......................... Passed 227.44 sec Start 38: hoVortex Start 63: quad9HC 2/77 Test Exawind#63: quad9HC ................................ Passed 71.32 sec Start 66: variableDensNonIso 3/77 Test Exawind#38: hoVortex ............................... Passed 117.75 sec Start 67: variableDensNonUniform 4/77 Test Exawind#67: variableDensNonUniform ................. Passed 77.28 sec Start 73: unitTest1 Start 57: periodic3dElemNp1 5/77 Test Exawind#66: variableDensNonIso ..................... Passed 139.80 sec Start 47: nonConformalWithPeriodic 6/77 Test Exawind#57: periodic3dElemNp1 ...................... Passed 18.53 sec Start 42: kovasznay_P5 7/77 Test Exawind#73: unitTest1 .............................. Passed 23.09 sec Start 60: periodic3dEdgeNp1 8/77 Test Exawind#60: periodic3dEdgeNp1 ...................... Passed 8.00 sec Start 41: karmanVortex 9/77 Test Exawind#42: kovasznay_P5 ........................... Passed 13.79 sec 10/77 Test Exawind#41: karmanVortex ........................... Passed 1.96 sec Start 74: unitTest2 11/77 Test Exawind#47: nonConformalWithPeriodic ............... Passed 18.18 sec Start 48: nonConformalWithPeriodicConsolidated 12/77 Test Exawind#74: unitTest2 .............................. Passed 13.63 sec Start 31: femHCGL 13/77 Test Exawind#48: nonConformalWithPeriodicConsolidated ... Passed 14.57 sec Start 26: elemClosedDomain 14/77 Test Exawind#31: femHCGL ................................ Passed 3.85 sec Start 30: femHC 15/77 Test Exawind#26: elemClosedDomain ....................... Passed 3.95 sec Start 19: ductElemWedge 16/77 Test Exawind#30: femHC .................................. Passed 3.90 sec Start 20: ductWedge 17/77 Test Exawind#19: ductElemWedge .......................... Passed 2.54 sec 18/77 Test Exawind#20: ductWedge .............................. Passed 1.73 sec Start 53: nonIsoNonUniformElemOpenJet 19/77 Test Exawind#53: nonIsoNonUniformElemOpenJet ............ Passed 165.31 sec Start 43: milestoneRun 20/77 Test Exawind#43: milestoneRun ........................... Passed 115.86 sec Start 4: ablUnstableEdge 21/77 Test Exawind#4: ablUnstableEdge ........................ Passed 89.04 sec Start 64: steadyTaylorVortex 22/77 Test Exawind#64: steadyTaylorVortex ..................... Passed 121.05 sec Start 44: milestoneRunConsolidated 23/77 Test Exawind#75: oversetHybrid .......................... Passed 1017.76 sec Start 77: waleElemXflowMixFrac3.5m 24/77 Test Exawind#44: milestoneRunConsolidated ............... Passed 104.43 sec Start 3: ablStableElem 25/77 Test Exawind#3: ablStableElem .......................... Passed 87.88 sec Start 52: nonIsoNonUniformEdgeOpenJet 26/77 Test Exawind#52: nonIsoNonUniformEdgeOpenJet ............ Passed 63.72 sec Start 16: dgNonConformalFluids 27/77 Test Exawind#16: dgNonConformalFluids ................... Passed 61.76 sec Start 50: nonIsoElemOpenJet 28/77 Test Exawind#50: nonIsoElemOpenJet ...................... Passed 61.97 sec Start 17: dgNonConformalFluidsEdge 29/77 Test Exawind#17: dgNonConformalFluidsEdge ............... Passed 50.90 sec Start 51: nonIsoElemOpenJetConsolidated 30/77 Test Exawind#51: nonIsoElemOpenJetConsolidated .......... Passed 52.57 sec Start 5: ablUnstableEdge_ra 31/77 Test Exawind#5: ablUnstableEdge_ra ..................... Passed 37.09 sec Start 24: ekmanSpiralConsolidated 32/77 Test Exawind#24: ekmanSpiralConsolidated ................ Passed 43.84 sec Start 40: inputFireElem 33/77 Test Exawind#77: waleElemXflowMixFrac3.5m ............... Passed 516.90 sec Start 37: hoHelium 34/77 Test Exawind#40: inputFireElem .......................... Passed 42.54 sec Start 34: heatedWaterChannelEdge 35/77 Test Exawind#34: heatedWaterChannelEdge ................. Passed 37.97 sec Start 18: dgNonConformalThreeBlade 36/77 Test Exawind#18: dgNonConformalThreeBlade ............... Passed 245.03 sec Start 49: nonIsoEdgeOpenJet 37/77 Test Exawind#49: nonIsoEdgeOpenJet ...................... Passed 23.74 sec Start 6: concentricRad 38/77 Test Exawind#6: concentricRad .......................... Passed 25.52 sec Start 11: dgNonConformal3dFluidsHexTet 39/77 Test Exawind#11: dgNonConformal3dFluidsHexTet ........... Passed 134.40 sec Start 29: elemPipeCHT 40/77 Test Exawind#29: elemPipeCHT ............................ Passed 18.82 sec Start 23: ekmanSpiral 41/77 Test Exawind#37: hoHelium ............................... Passed 539.10 sec Start 71: hoVortex_P2 42/77 Test Exawind#23: ekmanSpiral ............................ Passed 14.30 sec Start 10: dgNonConformal3dFluids 43/77 Test Exawind#10: dgNonConformal3dFluids ................. Passed 117.65 sec Start 25: elemBackStepLRSST 44/77 Test Exawind#25: elemBackStepLRSST ...................... Passed 11.00 sec Start 46: movingCylinder 45/77 Test Exawind#46: movingCylinder ......................... Passed 12.13 sec Start 22: edgePipeCHT 46/77 Test Exawind#22: edgePipeCHT ............................ Passed 10.90 sec Start 39: inputFireEdgeUpwind 47/77 Test Exawind#39: inputFireEdgeUpwind .................... Passed 7.94 sec Start 9: dgNonConformal 48/77 Test Exawind#9: dgNonConformal ......................... Passed 8.07 sec Start 58: periodic3dElemNp4 49/77 Test Exawind#58: periodic3dElemNp4 ...................... Passed 5.47 sec Start 61: periodic3dEdgeNp4 50/77 Test Exawind#61: periodic3dEdgeNp4 ...................... Passed 3.82 sec Start 35: heatedWaterChannelElem 51/77 Test Exawind#35: heatedWaterChannelElem ................. Passed 20.46 sec Start 13: dgNonConformalEdge 52/77 Test Exawind#13: dgNonConformalEdge ..................... Passed 7.60 sec 53/77 Test Exawind#71: hoVortex_P2 ............................ Passed 384.64 sec Start 36: heliumPlume 54/77 Test Exawind#36: heliumPlume ............................ Passed 226.47 sec Start 76: uqSlidingMeshDG 55/77 Test Exawind#76: uqSlidingMeshDG ........................ Passed 216.66 sec Start 70: cvfemHexHC_P3 56/77 Test Exawind#70: cvfemHexHC_P3 .......................... Passed 235.98 sec Start 32: fluidsPmrChtPeriodic 57/77 Test Exawind#32: fluidsPmrChtPeriodic ................... Passed 99.94 sec Start 72: steadyTaylorVortex_P4 58/77 Test Exawind#72: steadyTaylorVortex_P4 .................. Passed 154.11 sec Start 68: variableDensNonUniform_P5 59/77 Test Exawind#68: variableDensNonUniform_P5 .............. Passed 187.59 sec Start 1: ablNeutralEdge 60/77 Test Exawind#1: ablNeutralEdge ......................... Passed 78.58 sec Start 27: elemHybridFluids 61/77 Test Exawind#27: elemHybridFluids ....................... Passed 94.55 sec Start 69: BoussinesqNonIso 62/77 Test Exawind#69: BoussinesqNonIso ....................... Passed 90.32 sec Start 28: elemHybridFluidsShift 63/77 Test Exawind#28: elemHybridFluidsShift .................. Passed 88.30 sec Start 65: tensor_tg_mms 64/77 Test Exawind#65: tensor_tg_mms .......................... Passed 98.70 sec Start 15: dgNonConformalElemCylinder 65/77 Test Exawind#15: dgNonConformalElemCylinder ............. Passed 103.07 sec Start 21: edgeHybridFluids 66/77 Test Exawind#21: edgeHybridFluids ....................... Passed 61.73 sec Start 8: dgMMS Start 55: oversetFluids 67/77 Test Exawind#55: oversetFluids .......................... Passed 23.22 sec Start 56: oversetFluidsEdge 68/77 Test Exawind#56: oversetFluidsEdge ...................... Passed 21.07 sec Start 54: overset 69/77 Test Exawind#54: overset ................................ Passed 2.16 sec 70/77 Test Exawind#8: dgMMS .................................. Passed 51.45 sec Start 12: dgNonConformal3dFluidsP1P2 71/77 Test Exawind#12: dgNonConformal3dFluidsP1P2 ............. Passed 294.81 sec Start 45: mixedTetPipe 72/77 Test Exawind#45: mixedTetPipe ........................... Passed 30.65 sec Start 14: dgNonConformalEdgeCylinder 73/77 Test Exawind#14: dgNonConformalEdgeCylinder ............. Passed 65.57 sec Start 7: cvfemHC 74/77 Test Exawind#7: cvfemHC ................................ Passed 10.66 sec Start 59: periodic3dElemNp8 75/77 Test Exawind#59: periodic3dElemNp8 ...................... Passed 3.86 sec Start 62: periodic3dEdgeNp8 76/77 Test Exawind#62: periodic3dEdgeNp8 ...................... Passed 2.54 sec Start 2: ablNeutralEdgeSegregated 77/77 Test Exawind#2: ablNeutralEdgeSegregated ............... Passed 67.66 sec 100% tests passed, 0 tests failed out of 77 Label Time Summary: performance = 14010.66 sec*proc (3 tests) regression = 26090.50 sec*proc (68 tests) unit = 50.35 sec*proc (2 tests) verification = 6920.38 sec*proc (4 tests) Total Test time (real) = 4722.03 sec
) * Separate graph and solver types Add new CrsGraphTypes namespace that LinearSolverTypes pulls in. * Rename linear system helper file Rename file holding Crs graph helper functions. The new name reflects that the functions are for the matrix graph. * Add new CrsGraph class This is a step towards removing the matrix graph creation/initialization that is currently in TpetraLinearSystem. Currently, CrsGraph duplicates functionality TpetraLinearSystem. Everything compiles. * Move graph from TpetraLinearSystem to CrsGraph This commit adds a CrsGraph object to TpetraLinearSystem and forwards graph requests to the CrsGraph class. NaluWind compiles and regression test "AblNeutralEdge" runs. * Code clean up remove code that has been moved to CrsGraph * Fix unit test compilation Reenable methods getRowLID and getColLID, which are used in unit testing. Change header inclusion. * CrsGraph: initialize inConstruction flag in ctor uninitialized bool was causing initialization to be skipped With this commit, all tests pass: Test project /ascldap/users/jhu/exawind/build-naluwind-gcc-7.2.0-openmpi-1.10.1-opt Start 75: oversetHybrid Start 33: heatedBackStep 1/77 Test #33: heatedBackStep ......................... Passed 227.44 sec Start 38: hoVortex Start 63: quad9HC 2/77 Test #63: quad9HC ................................ Passed 71.32 sec Start 66: variableDensNonIso 3/77 Test #38: hoVortex ............................... Passed 117.75 sec Start 67: variableDensNonUniform 4/77 Test #67: variableDensNonUniform ................. Passed 77.28 sec Start 73: unitTest1 Start 57: periodic3dElemNp1 5/77 Test #66: variableDensNonIso ..................... Passed 139.80 sec Start 47: nonConformalWithPeriodic 6/77 Test #57: periodic3dElemNp1 ...................... Passed 18.53 sec Start 42: kovasznay_P5 7/77 Test #73: unitTest1 .............................. Passed 23.09 sec Start 60: periodic3dEdgeNp1 8/77 Test #60: periodic3dEdgeNp1 ...................... Passed 8.00 sec Start 41: karmanVortex 9/77 Test #42: kovasznay_P5 ........................... Passed 13.79 sec 10/77 Test #41: karmanVortex ........................... Passed 1.96 sec Start 74: unitTest2 11/77 Test #47: nonConformalWithPeriodic ............... Passed 18.18 sec Start 48: nonConformalWithPeriodicConsolidated 12/77 Test #74: unitTest2 .............................. Passed 13.63 sec Start 31: femHCGL 13/77 Test #48: nonConformalWithPeriodicConsolidated ... Passed 14.57 sec Start 26: elemClosedDomain 14/77 Test #31: femHCGL ................................ Passed 3.85 sec Start 30: femHC 15/77 Test #26: elemClosedDomain ....................... Passed 3.95 sec Start 19: ductElemWedge 16/77 Test #30: femHC .................................. Passed 3.90 sec Start 20: ductWedge 17/77 Test #19: ductElemWedge .......................... Passed 2.54 sec 18/77 Test #20: ductWedge .............................. Passed 1.73 sec Start 53: nonIsoNonUniformElemOpenJet 19/77 Test #53: nonIsoNonUniformElemOpenJet ............ Passed 165.31 sec Start 43: milestoneRun 20/77 Test #43: milestoneRun ........................... Passed 115.86 sec Start 4: ablUnstableEdge 21/77 Test #4: ablUnstableEdge ........................ Passed 89.04 sec Start 64: steadyTaylorVortex 22/77 Test #64: steadyTaylorVortex ..................... Passed 121.05 sec Start 44: milestoneRunConsolidated 23/77 Test #75: oversetHybrid .......................... Passed 1017.76 sec Start 77: waleElemXflowMixFrac3.5m 24/77 Test #44: milestoneRunConsolidated ............... Passed 104.43 sec Start 3: ablStableElem 25/77 Test #3: ablStableElem .......................... Passed 87.88 sec Start 52: nonIsoNonUniformEdgeOpenJet 26/77 Test #52: nonIsoNonUniformEdgeOpenJet ............ Passed 63.72 sec Start 16: dgNonConformalFluids 27/77 Test #16: dgNonConformalFluids ................... Passed 61.76 sec Start 50: nonIsoElemOpenJet 28/77 Test #50: nonIsoElemOpenJet ...................... Passed 61.97 sec Start 17: dgNonConformalFluidsEdge 29/77 Test #17: dgNonConformalFluidsEdge ............... Passed 50.90 sec Start 51: nonIsoElemOpenJetConsolidated 30/77 Test #51: nonIsoElemOpenJetConsolidated .......... Passed 52.57 sec Start 5: ablUnstableEdge_ra 31/77 Test #5: ablUnstableEdge_ra ..................... Passed 37.09 sec Start 24: ekmanSpiralConsolidated 32/77 Test #24: ekmanSpiralConsolidated ................ Passed 43.84 sec Start 40: inputFireElem 33/77 Test #77: waleElemXflowMixFrac3.5m ............... Passed 516.90 sec Start 37: hoHelium 34/77 Test #40: inputFireElem .......................... Passed 42.54 sec Start 34: heatedWaterChannelEdge 35/77 Test #34: heatedWaterChannelEdge ................. Passed 37.97 sec Start 18: dgNonConformalThreeBlade 36/77 Test #18: dgNonConformalThreeBlade ............... Passed 245.03 sec Start 49: nonIsoEdgeOpenJet 37/77 Test #49: nonIsoEdgeOpenJet ...................... Passed 23.74 sec Start 6: concentricRad 38/77 Test #6: concentricRad .......................... Passed 25.52 sec Start 11: dgNonConformal3dFluidsHexTet 39/77 Test #11: dgNonConformal3dFluidsHexTet ........... Passed 134.40 sec Start 29: elemPipeCHT 40/77 Test #29: elemPipeCHT ............................ Passed 18.82 sec Start 23: ekmanSpiral 41/77 Test #37: hoHelium ............................... Passed 539.10 sec Start 71: hoVortex_P2 42/77 Test #23: ekmanSpiral ............................ Passed 14.30 sec Start 10: dgNonConformal3dFluids 43/77 Test #10: dgNonConformal3dFluids ................. Passed 117.65 sec Start 25: elemBackStepLRSST 44/77 Test #25: elemBackStepLRSST ...................... Passed 11.00 sec Start 46: movingCylinder 45/77 Test #46: movingCylinder ......................... Passed 12.13 sec Start 22: edgePipeCHT 46/77 Test #22: edgePipeCHT ............................ Passed 10.90 sec Start 39: inputFireEdgeUpwind 47/77 Test #39: inputFireEdgeUpwind .................... Passed 7.94 sec Start 9: dgNonConformal 48/77 Test #9: dgNonConformal ......................... Passed 8.07 sec Start 58: periodic3dElemNp4 49/77 Test #58: periodic3dElemNp4 ...................... Passed 5.47 sec Start 61: periodic3dEdgeNp4 50/77 Test #61: periodic3dEdgeNp4 ...................... Passed 3.82 sec Start 35: heatedWaterChannelElem 51/77 Test #35: heatedWaterChannelElem ................. Passed 20.46 sec Start 13: dgNonConformalEdge 52/77 Test #13: dgNonConformalEdge ..................... Passed 7.60 sec 53/77 Test #71: hoVortex_P2 ............................ Passed 384.64 sec Start 36: heliumPlume 54/77 Test #36: heliumPlume ............................ Passed 226.47 sec Start 76: uqSlidingMeshDG 55/77 Test #76: uqSlidingMeshDG ........................ Passed 216.66 sec Start 70: cvfemHexHC_P3 56/77 Test #70: cvfemHexHC_P3 .......................... Passed 235.98 sec Start 32: fluidsPmrChtPeriodic 57/77 Test #32: fluidsPmrChtPeriodic ................... Passed 99.94 sec Start 72: steadyTaylorVortex_P4 58/77 Test #72: steadyTaylorVortex_P4 .................. Passed 154.11 sec Start 68: variableDensNonUniform_P5 59/77 Test #68: variableDensNonUniform_P5 .............. Passed 187.59 sec Start 1: ablNeutralEdge 60/77 Test #1: ablNeutralEdge ......................... Passed 78.58 sec Start 27: elemHybridFluids 61/77 Test #27: elemHybridFluids ....................... Passed 94.55 sec Start 69: BoussinesqNonIso 62/77 Test #69: BoussinesqNonIso ....................... Passed 90.32 sec Start 28: elemHybridFluidsShift 63/77 Test #28: elemHybridFluidsShift .................. Passed 88.30 sec Start 65: tensor_tg_mms 64/77 Test #65: tensor_tg_mms .......................... Passed 98.70 sec Start 15: dgNonConformalElemCylinder 65/77 Test #15: dgNonConformalElemCylinder ............. Passed 103.07 sec Start 21: edgeHybridFluids 66/77 Test #21: edgeHybridFluids ....................... Passed 61.73 sec Start 8: dgMMS Start 55: oversetFluids 67/77 Test #55: oversetFluids .......................... Passed 23.22 sec Start 56: oversetFluidsEdge 68/77 Test #56: oversetFluidsEdge ...................... Passed 21.07 sec Start 54: overset 69/77 Test #54: overset ................................ Passed 2.16 sec 70/77 Test #8: dgMMS .................................. Passed 51.45 sec Start 12: dgNonConformal3dFluidsP1P2 71/77 Test #12: dgNonConformal3dFluidsP1P2 ............. Passed 294.81 sec Start 45: mixedTetPipe 72/77 Test #45: mixedTetPipe ........................... Passed 30.65 sec Start 14: dgNonConformalEdgeCylinder 73/77 Test #14: dgNonConformalEdgeCylinder ............. Passed 65.57 sec Start 7: cvfemHC 74/77 Test #7: cvfemHC ................................ Passed 10.66 sec Start 59: periodic3dElemNp8 75/77 Test #59: periodic3dElemNp8 ...................... Passed 3.86 sec Start 62: periodic3dEdgeNp8 76/77 Test #62: periodic3dEdgeNp8 ...................... Passed 2.54 sec Start 2: ablNeutralEdgeSegregated 77/77 Test #2: ablNeutralEdgeSegregated ............... Passed 67.66 sec 100% tests passed, 0 tests failed out of 77 Label Time Summary: performance = 14010.66 sec*proc (3 tests) regression = 26090.50 sec*proc (68 tests) unit = 50.35 sec*proc (2 tests) verification = 6920.38 sec*proc (4 tests) Total Test time (real) = 4722.03 sec * Add header to fix compile error. * Code cleanup remove ifdef'd out code * TpetraSegregatedLinearSystem uses CrsGraph class Code compiles, ablNeutralEdgeSegregated test passes. * TpetraSegregatedLinearSystem: code cleanup remove inactive code protected by macros * TpetraSegregatedLinearSystem: use Tpetra GIDs in copy_stk_to_tpetra(), use realm_.tpetGlobalId_ instead of realm_.naluGlobalId_ 1/1 Test #2: ablNeutralEdgeSegregated ......... Passed 68.39 sec * Fix cmake error from merge * Add missing beginConstruction * Add matrix graph reuse between physics phases * Add forgotten header. Oops. * add parser option to rebalance mesh using stk_balance (#471) * add parser option to rebalance mesh using stk_balance * Move call to doBalanceNodes after rebalanceMesh * Add throw when rebalance is called with parmetis but Zoltan2 is not built with parmetis enabled * Remove unused non-NGP code (#486) * Remove unused classes after transition to NGP * Remove deprecated code used during NGP transition * Remove OversetManagerSTK in preparation for NGP transition of overset capability (#487) * Prepare for deprecation of STK-search based overset manager * Remove deprecated OversetManagerSTK * Remove deprecated methods related to overset logic - Remove `LinearSystem::prepareConstraints` and its overrides - Remove `SolverAlgorithm::apply_coeff` methods that have NGP versions * Remove the tests from CTestList.cmake also * Hypre regression tests moved into guarded block Regression tests using Hypre as one of their solvers need to be guarded with if(ENABLE_HYPRE) endif() * Airfoil tests using Trilinos solver stack, see issue #492 (#494) * Airfoil tests using Trilinos solver stack, see issue #492 This adds two new tests that use MueLu as preconditioner for the continuity eqs. The convergence behavior is reasonable and sets things well for future GPU testing of inflow/outflow BCs. * Fixing input decks to use the segregated solvers for momentum Fixes #492 * NGP conversion of TpetraLinearSystem::applyDirichletBCs (#493) * NGP conversion of TpetraLinearSystem::applyDirichletBCs with unit-test. * Minor fixes to ensure fields are sync'd correctly (host vs device) * Use NGP_ThrowAssert instead of NGP_ThrowRequire * Fix accidental revert of the mesh submodule. * CrsGraph: ensure finalizeGraph called only once * CrsGraph: initialize variable in ctor * Updates for CrsGraph These commits add the calls buildElemToNodeGraph and buildFaceElemToNodeGraph to various "initalize_connectivity" methods. This effectively creates a superset of graph connectivity for different physics. Also added a check to avoid calling CrsGraph::finalizeGraph multiple times (suggested by RK). Currently, these tests are still failing: 1 - ablNeutralEdge (Failed) 4 - ablUnstableEdge (Failed) 5 - ablUnstableEdge_ra (Failed) 6 - airfoilRANSEdgeTrilinos (Failed) 22 - ductWedge (Failed) 23 - edgeHybridFluids (Failed) 24 - edgePipeCHT (Failed) 43 - karmanVortex (Failed) 50 - nonIsoEdgeOpenJet (Failed) 53 - nonIsoNonUniformEdgeOpenJet (Failed) * Prepare equation systems for common graph * Revert "Updates for CrsGraph" This reverts commit 880555d. * Modify EquationSystems to use a common graph for Tpetra * Common graph initialization refactor - Only initialize graphs if they are required by TpetraLinearSystems - Make sure both hypre and tpetra pathways work correctly * Add timers to track init costs missing from EquationSystem timers * Consolidate buildNodeGraph * Propagate changes to TpetraSegregatedLinearSystem * Fix regression test failures Remove call to finalizeLinearSystem, as was done for other physics * Update unit test helpers Avoids referencing uninitialized maps. However, some tests are still not passing. * Improve unit-testing for sparse-matrix values. (#544) * Improve unit-testing for sparse-matrix values. It used to just compare each matrix column and coefficient to the expected gold values, expecting position to match as well. Now it searchs the matrix row for each gold column and checks if the coefficient matches, and doesn't fail if there are extra columns in the matrix. * Add a check to make sure non-gold columns have coeffs of zero. * Fixing build error after updating Trilinos * NaluWind: fix nonlinear residual Make the nonlinear residual convergence tolerance for momentum and continuity consistent with the tolerance set by the user. * NaluWind: fix compile errors * RegTests: update linear solver input deck Tpetra matrices now potentially store hard zeros, due to refactoring to use pre-computed CrsGraph's. The test fluidsPmrChtPeriodic is sensitive to these zeros, so I've created a new MueLu input deck that drops the hard zeros during multigrid coarsening. * RegTests: new MueLu settings for fluidsPmrChtPeriodic Add aggregation dropping threshold to remove hard zeros due to CrsGraph refactor. * Remove commented out code. * Fix inheritance of nonlinear tolerances This is a much better fix, suggested by Shreyas. Co-authored-by: Johnathan Vo <jvo1012@users.noreply.github.com> Co-authored-by: Shreyas Ananthan <shreyas@umd.edu> Co-authored-by: Luc Berger-Vergiat <lberge@sandia.gov> Co-authored-by: Alan Williams <william@sandia.gov> Co-authored-by: Jon Rood <jon.rood@nrel.gov>
Fix timerMisc_ for LowMach and Momentum EQS that were counting certain
execution regions twice. timerMisc_ was accumulating time when
computing projected_nodal_gradient which in turn was either
incrementing timerMisc_ or accumulating time in PNGEQS. Current fix is
to keep calls to compute_projected_nodal_gradient out of timed blocks.
Low Mach timerMisc_ accumulates time with project_nodal_velocity which
calls ContinuityEQS projected_nodal_gradient where there is additional
time accumulation in continuityEQS->timerMisc_.
"No output time" was reporting CPU Time instead of WallClock time.
This commit changes that to WallClock time so that it is consistent
with what is being reported for STKPERF: Total time
"Timing for connectivity" summary was removed because this is already
reported as "init" for each equation system
With these changes the sum of all the timers in the timing table should
be less than the total time reported for "main()" as well as "STKPERF:
Total Time" in the summary.