Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Nalu timers #18

Merged
merged 1 commit into from
Jul 9, 2018
Merged

Fix Nalu timers #18

merged 1 commit into from
Jul 9, 2018

Conversation

sayerhs
Copy link
Contributor

@sayerhs sayerhs commented Jul 6, 2018

  • Fix timerMisc_ for LowMach and Momentum EQS that were counting certain
    execution regions twice. timerMisc_ was accumulating time when
    computing projected_nodal_gradient which in turn was either
    incrementing timerMisc_ or accumulating time in PNGEQS. Current fix is
    to keep calls to compute_projected_nodal_gradient out of timed blocks.

  • Low Mach timerMisc_ accumulates time with project_nodal_velocity which
    calls ContinuityEQS projected_nodal_gradient where there is additional
    time accumulation in continuityEQS->timerMisc_.

  • "No output time" was reporting CPU Time instead of WallClock time.
    This commit changes that to WallClock time so that it is consistent
    with what is being reported for STKPERF: Total time

  • "Timing for connectivity" summary was removed because this is already
    reported as "init" for each equation system

With these changes the sum of all the timers in the timing table should
be less than the total time reported for "main()" as well as "STKPERF:
Total Time" in the summary.

- Fix timerMisc for LowMach and Momentum EQS that were counting certain
  execution regions twice. timerMisc_ was accumulating time when
  computing projected_nodal_gradient which in turn was either
  incrementing timerMisc_ or accumulating time in PNGEQS. Current fix is
  to keep calls to compute_projected_nodal_gradient out of timed blocks.

- Low Mach timerMisc_ accumulates time with project_nodal_velocity which
  calls ContinuityEQS projected_nodal_gradient where there is additional
  time accumulation in continuityEQS->timerMisc_.

- "No output time" was reporting CPU Time instead of WallClock time.
  This commit changes that to WallClock time so that it is consistent
  with what is being reported for STKPERF: Total time

- "Timing for connectivity" summary was removed because this is already
  reported as "init" for each equation system

With these changes the sum of all the timers in the timing table should
be less than the total time reported for "main()" as well as "STKPERF:
Total Time" in the summary.
@sayerhs sayerhs requested review from spdomin and alanw0 July 6, 2018 23:22
@sayerhs sayerhs self-assigned this Jul 6, 2018
@sayerhs
Copy link
Contributor Author

sayerhs commented Jul 6, 2018

This was previously discussed here: NaluCFD/Nalu#189 and was pointed out by the solvers team during the ECP FY18-Q3 milestone work. I noticed that the sum of individual timers did not add up to the total time even on non-OpenMP runs, and found several places where the timers were double counted.

Cc: @sthomas61 @aprokop @jhux2

@aprokop
Copy link
Contributor

aprokop commented Jul 8, 2018

This is great! But is there a way if that's all of them? Or is the possible that some problems still exist? How does one check (now, or in the future)?

@sayerhs
Copy link
Contributor Author

sayerhs commented Jul 8, 2018

@aprokop I assume your question is "is there a way to be sure if this commit fixes all the timer issues?" Short of a thorough code audit by several people, I don't know of a good way to prove that there are no more timer issues.

What we can be fairly certain after this commit, the discussion/fixes in issues NaluCFD/Nalu#238 and NaluCFD/Nalu#189:

  • All times reported in Nalu timer summary is WallClock time and there are no more CPUTime reported. So multi-threaded runs can be compared with non-threaded runs.

  • The timers for linear solver (init, assemble, load_complete, precond, and solve) are reliable for all runs with code base after this merge Fix error in preconditioner timer calculations with system reinitialization NaluCFD/Nalu#241. They are updated only at one location in the codebase, i.e., not nested.

  • The main() and STKPERF: Total time timers have been correct since Timing mis-match when threading NaluCFD/Nalu#189 was fixed by @spdomin.

  • The main() timer should be the same as or very close to STKPERF: Total time. The difference between main() and STKPERF: Total time is that main() is the average, min, and max across all processors. STKPERF: Total time is the time on MPI rank 0. The "No output time" will always be less than STKPERF: Total time even when threading is active.

The place where a second look would be useful is the misc timers. I have gone through LowMach, Enthalpy, and Turbulence EQS and have fixed all the issues I could see. While I feel that I've caught everything in those equation systems, I can't guarantee it. So I would be cautious of any conclusions that depend solely on misc timers. However, this doesn't corrupt other timers because the total time is arrived at independently and not by summing up the timers reported in the table.

We would definitely know that there are more issues if the sum of all timers in the reported table adds up to more than main(). However, the opposite doesn't mean there are no further issues.

@sayerhs
Copy link
Contributor Author

sayerhs commented Jul 8, 2018

I would also recommend that we add another timer that tracks the total time spent in each EQS and use that as a metric instead of summing up individual pieces. That will quickly show us any issues if there is still double counting in any of the misc timers, and also be a better estimate of total time spent in any equation system than summing up these pieces as there are several portions of code that are not covered by the timers in Nalu currently.

Further, we can also check that the sum of Continuity and Momentum total time is less than or equal to timing reported for LowMachEQS and so on.

@aprokop
Copy link
Contributor

aprokop commented Jul 8, 2018

I think there may still be a problem somewhere. If I go back to my June runs, I had for example the following reporte max timers (different columns for different MPI/thread combinations):

-  \multicolumn{6}{l}{Momentum equation} \\
-  \midrule
-  init               & 61.1     & 80.2      & 130.8     & 255.0     & 426.1 \\
-  assemble           & 202.7    & 369.1     & 650.1     & 1238.3    & 2137.3 \\
-  load complete      & 154.3    & 266.5     & 461.7     & 862.0     & 1229.1 \\
-  solve              & 252.3    & 271.5     & 306.5     & 387.5     & 444.4 \\
-  precond setup      & 42.0     & 49.4      & 56.8      & 75.4      & 88.8 \\
-  misc               & 19.1     & 36.6      & 66.2      & 133.8     & 228.3 \\
-  \midrule
...
-  \multicolumn{6}{l}{Continuity equation} \\
-  \midrule
-  init               & 35.4     & 50.8      & 88.5      & 180.5     & 311.8 \\
-  assemble           & 28.5     & 54.2      & 98.2      & 196.2     & 351.8 \\
-  load complete      & 22.4     & 38.3      & 68.8      & 135.7     & 207.9 \\
-  solve              & 203.4    & 188.4     & 211.8     & 260.7     & 304.5 \\
-  precond setup      & 73.5     & 66.1      & 87.1      & 122.8     & 148.6 \\
-  misc               & 21.7     & 39.9      & 72.7      & 142.9     & 248.0 \\
...
-  total time         & 1081.3   & 1370.2    & 1952.3    & 3246.4    & 5091.3 \\

So the total time (STKPERF) was smaller than just the sum of init+assemble+load complete+solve+precond setup for momentum+continuity. So even with fixed misc (or simply ignoring it) it still does not explain why the max timers would add up to more. The only possible explanation for that would have been if max for different timers was achieved on different ranks, but my understanding that each of these timers is a global barrier.

@sayerhs
Copy link
Contributor Author

sayerhs commented Jul 8, 2018

@aprokop In Nalu is just using stk::wall_time(). So there is a high probability that the max time does occur on different ranks. Further, the max for solve might be very likely occurring on a different rank than for say init. Now when Nalu reports timers, it is doing an MPI_Reduce with MPI_SUM, MPI_MIN and MPI_MAX for the avg, min, and max respectively.

Furthermore, STKPERF Total time is not global (across all MPI ranks), it is the time reported by MPI rank=0 – see nalu.C. Also there are no explicit MPI_Barrier in any of these section just before time is accumulated, so I am not sure we can expect STKPERF total time to be always greater than the sum of all the max times.

The situation might be different if we are using Techos::Time because that appears to use MPI_Wtime which might be synchronizing across all procs for MPICH. Although the OpenMPI docs for MPI_Wtime says that the times are local.

@sayerhs
Copy link
Contributor Author

sayerhs commented Jul 8, 2018

@aprokop Can you explain the columns in your timers?

In [1]: 61.1+202.7+154.3+252.3+42.0+19.1+35.4+28.5+22.4+203.4+73.4+21.7 # Including misc
Out[1]: 1116.3000000000002

In [2]: 61.1+202.7+154.3+252.3+42.0+35.4+28.5+22.4+203.4+73.4 # Excluding misc
Out[2]: 1075.5

In [3]: 80.2+369.1+366.5+271.5+49.4+36.6+50.8+54.2+38.3+188.4+66.1+39.9
Out[3]: 1611.0

In [4]: 80.2+369.1+366.5+271.5+49.4+50.8+54.2+38.3+188.4+66.1
Out[4]: 1534.5

For the second column, the difference is quite large. So I agree with you that there is still something that is unexplained.

@sayerhs
Copy link
Contributor Author

sayerhs commented Jul 8, 2018

@aprokop Here are the locations in Nalu source code where the time is accumulated for the equation systems. May be you'll notice something that I overlooked.

Looking at the source code, you'll notice that precond setup time is nested within timerSolve_, but we subtract it out before reporting (dump_eq_time).

As far as I can tell, there is no barrier within assemble portion, so that section could be overlapping on certain procs with init and/or load_complete until Tpetra would enforce a barrier. We could test this hypothesis by rerunning one of your cases with an MPI Barrier in nalu_time like so

double
NaluEnv::nalu_time()
{
  MPI_Barrier(parallelCommunicator_);
  return stk::wall_time();
}

@aprokop
Copy link
Contributor

aprokop commented Jul 9, 2018

@aprokop In Nalu is just using stk::wall_time(). So there is a high probability that the max time does occur on different ranks. Further, the max for solve might be very likely occurring on a different rank than for say init.

Well, any solve would synchronize due to reductions. Any assemble too due to calling Tpetra's fillComplete (if that's the stage it happens in). At least, I think there are a bunch of barriers (implicit or explicit) that it is very likely that each stage is barrier'd.

@aprokop Can you explain the columns in your timers?

The first column is 32 MPI ranks with a single thread per ranks, so 2048 MPI with a single thread. The second column, same V27 41 R1 problem, but in 16x2, so 1048 MPI ranks with 2 threads each. And so on, column 3 is 8x4, ... It is certainly an interesting observation that the timings seem off when running with multiple threads. No clue of why that would be, though.

Looking at the source code, you'll notice that precond setup time is nested within timerSolve_, but we subtract it out before reporting (dump_eq_time).

Yeah, I saw that.

We could test this hypothesis by rerunning one of your cases with an MPI Barrier in nalu_time like so

Certainly, this could be an interesting experiment.

Could also try replacing gettimeofday in stk::wall_time() with something like omp_get_wtime or clock_gettime(CLOCK_MONOTONIC, ...).

@aprokop
Copy link
Contributor

aprokop commented Jul 9, 2018

Ran on Cori in 4x8 configuration with this PR and MPI_Barrier in nalu_time:

Timing for Eq: myLowMach                                                                                                                                       
             init --    avg: 0.00356548     min: 0.00195289     max: 0.00453711                                                                                
         assemble --    avg: 0  min: 0  max: 0                                                                                                                 
    load_complete --    avg: 0  min: 0  max: 0                                                                                                                 
            solve --    avg: 0  min: 0  max: 0                                                                                                                 
    precond setup --    avg: 0  min: 0  max: 0                                                                                                                 
             misc --    avg: 0  min: 0  max: 0                                                                                                                 
Timing for Eq: MomentumEQS                                                                                                                                                                             
             init --    avg: 136.387    min: 136.387    max: 136.388  
         assemble --    avg: 655.823    min: 655.803    max: 655.847                                                                                           
    load_complete --    avg: 8.26235    min: 8.25896    max: 8.26611                                                                                           
            solve --    avg: 310.468    min: 310.461    max: 310.479                                                                                           
    precond setup --    avg: 57.8206    min: 57.8196    max: 57.8226                                                                                           
             misc --    avg: 33.5131    min: 33.4873    max: 33.5183                                                                                           
linear iterations --    avg: 10.15  min: 1  max: 20                                                                                                            
Timing for Eq: ContinuityEQS                                                                                                                                   
             init --    avg: 89.2507    min: 89.2497    max: 89.2537                                                                                           
         assemble --    avg: 99.432     min: 99.4036    max: 99.4564                                                                                           
    load_complete --    avg: 1.1394     min: 1.13707    max: 1.13988                                                                                           
            solve --    avg: 228.954    min: 228.951    max: 228.968                                                                                           
    precond setup --    avg: 91.5628    min: 91.5611    max: 91.5643                                                                                           
             misc --    avg: 71.4769    min: 71.4664    max: 71.4922                                                                                           
linear iterations --    avg: 29.9   min: 26     max: 33                                                                                                        
Timing for IO:                                                                                                                                                 
   io create mesh --    avg: 0.0702459  min: 0.070142   max: 0.0703211                                                                                         
 io output fields --    avg: 0.0139458  min: 0.00397229     max: 0.027122                                                                                      
 io populate mesh --    avg: 37.9507    min: 37.9506    max: 37.9507                                                                                           
 io populate fd   --    avg: 1.47922    min: 1.47919    max: 1.47925                                                                                           
Timing for property evaluation:                                                                                                                                
            props --    avg: 0.219486   min: 0.204501   max: 0.226283
Timing for Nonconformal:                                                                                                                                       
  nonconformal bc --    avg: 113.954    min: 113.95     max: 113.955                                                                                           
Timing for skin_mesh :                                                                                                                                         
        skin_mesh --    avg: 0.747587   min: 0.747544   max: 0.74768
...
STKPERF: Total Time: 2005.5482  

Hmm, now the sum of max timers seems less than total time, as expected.

There is trove of interesting info here. First, myLowMach misc timers is 0, that was never the case for us. Second, comparing it with the previous 4x8 run, we see there is/was something fishy with load complete times. Here, they are almost negligible, while above they are huge.

@sayerhs
Copy link
Contributor Author

sayerhs commented Jul 9, 2018

@aprokop LowMachEqs - misc timer is zero now because I removed the timer count around project_nodal_velocity in the LowMachEquationSystem::solve_and_update method.

The timer info for load_complete is definitely interesting. Could it be because there was no explicit barrier previously around assemble, so some MPI rank is starting the load_complete timer and then waiting on a Barrier imposed by Tpetra, while another rank is still processing elements for assemble?

The MomentumEQS is missing init line, can you add that back in please?

@sayerhs
Copy link
Contributor Author

sayerhs commented Jul 9, 2018

I am attaching a schematic of what I think is going on here with the assemble and load_complete max timers. Since there is no explicit barrier, the idle time between two MPI synchronizations is bookkept in different sections across different MPI ranks. So when we do a MPI_MAX operation, the same time gets reported in different sections assemble and load_complete. I also suspect that if we compare max - min times for assemble and load_complete we will see that there was a much bigger difference in the previous run than in the current run with MPI_Barriers.

So why does this situation get worse with multiple threads? Could it be that based on scheduling and load imbalance, the thread synchronizations imposed by atomic_add is different across MPI ranks and that could play a role?

nalutimerschematic

@aprokop
Copy link
Contributor

aprokop commented Jul 9, 2018

The MomentumEQS is missing init line, can you add that back in please?

Done. It's interesting that it was there, but not shown for some reason.

Copy link
Contributor

@alanw0 alanw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sayerhs I think these changes look good. It is hard to in-tangle the times but this looks like a step in the right direction. I believe the assemble phase is bracketed by synchronizations. finalizeLinearSystem() happens before assembly starts, and it calls graph->fillComplete() and other methods which include synchronization. load_complete() is called at the end of assembly, and I'm pretty sure that is also a synchronization.

@sayerhs sayerhs merged commit 2c95770 into Exawind:master Jul 9, 2018
@sayerhs sayerhs deleted the d/timers branch July 9, 2018 15:27
alanw0 referenced this pull request Jan 29, 2019
They now both use the new MultiDimViews class.

Next step will be to make them both have the same template parameters.
jhux2 added a commit to jhux2/nalu-wind that referenced this pull request Nov 21, 2019
uninitialized bool was causing initialization to be skipped

With this commit, all tests pass:

Test project
/ascldap/users/jhu/exawind/build-naluwind-gcc-7.2.0-openmpi-1.10.1-opt
      Start 75: oversetHybrid
      Start 33: heatedBackStep
 1/77 Test Exawind#33: heatedBackStep .........................   Passed
227.44 sec
      Start 38: hoVortex
      Start 63: quad9HC
 2/77 Test Exawind#63: quad9HC ................................   Passed
71.32 sec
      Start 66: variableDensNonIso
 3/77 Test Exawind#38: hoVortex ...............................   Passed
117.75 sec
      Start 67: variableDensNonUniform
 4/77 Test Exawind#67: variableDensNonUniform .................   Passed
77.28 sec
      Start 73: unitTest1
      Start 57: periodic3dElemNp1
 5/77 Test Exawind#66: variableDensNonIso .....................   Passed
139.80 sec
      Start 47: nonConformalWithPeriodic
 6/77 Test Exawind#57: periodic3dElemNp1 ......................   Passed
18.53 sec
      Start 42: kovasznay_P5
 7/77 Test Exawind#73: unitTest1 ..............................   Passed
23.09 sec
      Start 60: periodic3dEdgeNp1
 8/77 Test Exawind#60: periodic3dEdgeNp1 ......................   Passed
8.00 sec
      Start 41: karmanVortex
 9/77 Test Exawind#42: kovasznay_P5 ...........................   Passed
13.79 sec
10/77 Test Exawind#41: karmanVortex ...........................   Passed
1.96 sec
      Start 74: unitTest2
11/77 Test Exawind#47: nonConformalWithPeriodic ...............   Passed
18.18 sec
      Start 48: nonConformalWithPeriodicConsolidated
12/77 Test Exawind#74: unitTest2 ..............................   Passed
13.63 sec
      Start 31: femHCGL
13/77 Test Exawind#48: nonConformalWithPeriodicConsolidated ...   Passed
14.57 sec
      Start 26: elemClosedDomain
14/77 Test Exawind#31: femHCGL ................................   Passed
3.85 sec
      Start 30: femHC
15/77 Test Exawind#26: elemClosedDomain .......................   Passed
3.95 sec
      Start 19: ductElemWedge
16/77 Test Exawind#30: femHC ..................................   Passed
3.90 sec
      Start 20: ductWedge
17/77 Test Exawind#19: ductElemWedge ..........................   Passed
2.54 sec
18/77 Test Exawind#20: ductWedge ..............................   Passed
1.73 sec
      Start 53: nonIsoNonUniformElemOpenJet
19/77 Test Exawind#53: nonIsoNonUniformElemOpenJet ............   Passed
165.31 sec
      Start 43: milestoneRun
20/77 Test Exawind#43: milestoneRun ...........................   Passed
115.86 sec
      Start  4: ablUnstableEdge
21/77 Test  Exawind#4: ablUnstableEdge ........................   Passed
89.04 sec
      Start 64: steadyTaylorVortex
22/77 Test Exawind#64: steadyTaylorVortex .....................   Passed
121.05 sec
      Start 44: milestoneRunConsolidated
23/77 Test Exawind#75: oversetHybrid ..........................   Passed
1017.76 sec
      Start 77: waleElemXflowMixFrac3.5m
24/77 Test Exawind#44: milestoneRunConsolidated ...............   Passed
104.43 sec
      Start  3: ablStableElem
25/77 Test  Exawind#3: ablStableElem ..........................   Passed
87.88 sec
      Start 52: nonIsoNonUniformEdgeOpenJet
26/77 Test Exawind#52: nonIsoNonUniformEdgeOpenJet ............   Passed
63.72 sec
      Start 16: dgNonConformalFluids
27/77 Test Exawind#16: dgNonConformalFluids ...................   Passed
61.76 sec
      Start 50: nonIsoElemOpenJet
28/77 Test Exawind#50: nonIsoElemOpenJet ......................   Passed
61.97 sec
      Start 17: dgNonConformalFluidsEdge
29/77 Test Exawind#17: dgNonConformalFluidsEdge ...............   Passed
50.90 sec
      Start 51: nonIsoElemOpenJetConsolidated
30/77 Test Exawind#51: nonIsoElemOpenJetConsolidated ..........   Passed
52.57 sec
      Start  5: ablUnstableEdge_ra
31/77 Test  Exawind#5: ablUnstableEdge_ra .....................   Passed
37.09 sec
      Start 24: ekmanSpiralConsolidated
32/77 Test Exawind#24: ekmanSpiralConsolidated ................   Passed
43.84 sec
      Start 40: inputFireElem
33/77 Test Exawind#77: waleElemXflowMixFrac3.5m ...............   Passed
516.90 sec
      Start 37: hoHelium
34/77 Test Exawind#40: inputFireElem ..........................   Passed
42.54 sec
      Start 34: heatedWaterChannelEdge
35/77 Test Exawind#34: heatedWaterChannelEdge .................   Passed
37.97 sec
      Start 18: dgNonConformalThreeBlade
36/77 Test Exawind#18: dgNonConformalThreeBlade ...............   Passed
245.03 sec
      Start 49: nonIsoEdgeOpenJet
37/77 Test Exawind#49: nonIsoEdgeOpenJet ......................   Passed
23.74 sec
      Start  6: concentricRad
38/77 Test  Exawind#6: concentricRad ..........................   Passed
25.52 sec
      Start 11: dgNonConformal3dFluidsHexTet
39/77 Test Exawind#11: dgNonConformal3dFluidsHexTet ...........   Passed
134.40 sec
      Start 29: elemPipeCHT
40/77 Test Exawind#29: elemPipeCHT ............................   Passed
18.82 sec
      Start 23: ekmanSpiral
41/77 Test Exawind#37: hoHelium ...............................   Passed
539.10 sec
      Start 71: hoVortex_P2
42/77 Test Exawind#23: ekmanSpiral ............................   Passed
14.30 sec
      Start 10: dgNonConformal3dFluids
43/77 Test Exawind#10: dgNonConformal3dFluids .................   Passed
117.65 sec
      Start 25: elemBackStepLRSST
44/77 Test Exawind#25: elemBackStepLRSST ......................   Passed
11.00 sec
      Start 46: movingCylinder
45/77 Test Exawind#46: movingCylinder .........................   Passed
12.13 sec
      Start 22: edgePipeCHT
46/77 Test Exawind#22: edgePipeCHT ............................   Passed
10.90 sec
      Start 39: inputFireEdgeUpwind
47/77 Test Exawind#39: inputFireEdgeUpwind ....................   Passed
7.94 sec
      Start  9: dgNonConformal
48/77 Test  Exawind#9: dgNonConformal .........................   Passed
8.07 sec
      Start 58: periodic3dElemNp4
49/77 Test Exawind#58: periodic3dElemNp4 ......................   Passed
5.47 sec
      Start 61: periodic3dEdgeNp4
50/77 Test Exawind#61: periodic3dEdgeNp4 ......................   Passed
3.82 sec
      Start 35: heatedWaterChannelElem
51/77 Test Exawind#35: heatedWaterChannelElem .................   Passed
20.46 sec
      Start 13: dgNonConformalEdge
52/77 Test Exawind#13: dgNonConformalEdge .....................   Passed
7.60 sec
53/77 Test Exawind#71: hoVortex_P2 ............................   Passed
384.64 sec
      Start 36: heliumPlume
54/77 Test Exawind#36: heliumPlume ............................   Passed
226.47 sec
      Start 76: uqSlidingMeshDG
55/77 Test Exawind#76: uqSlidingMeshDG ........................   Passed
216.66 sec
      Start 70: cvfemHexHC_P3
56/77 Test Exawind#70: cvfemHexHC_P3 ..........................   Passed
235.98 sec
      Start 32: fluidsPmrChtPeriodic
57/77 Test Exawind#32: fluidsPmrChtPeriodic ...................   Passed
99.94 sec
      Start 72: steadyTaylorVortex_P4
58/77 Test Exawind#72: steadyTaylorVortex_P4 ..................   Passed
154.11 sec
      Start 68: variableDensNonUniform_P5
59/77 Test Exawind#68: variableDensNonUniform_P5 ..............   Passed
187.59 sec
      Start  1: ablNeutralEdge
60/77 Test  Exawind#1: ablNeutralEdge .........................   Passed
78.58 sec
      Start 27: elemHybridFluids
61/77 Test Exawind#27: elemHybridFluids .......................   Passed
94.55 sec
      Start 69: BoussinesqNonIso
62/77 Test Exawind#69: BoussinesqNonIso .......................   Passed
90.32 sec
      Start 28: elemHybridFluidsShift
63/77 Test Exawind#28: elemHybridFluidsShift ..................   Passed
88.30 sec
      Start 65: tensor_tg_mms
64/77 Test Exawind#65: tensor_tg_mms ..........................   Passed
98.70 sec
      Start 15: dgNonConformalElemCylinder
65/77 Test Exawind#15: dgNonConformalElemCylinder .............   Passed
103.07 sec
      Start 21: edgeHybridFluids
66/77 Test Exawind#21: edgeHybridFluids .......................   Passed
61.73 sec
      Start  8: dgMMS
      Start 55: oversetFluids
67/77 Test Exawind#55: oversetFluids ..........................   Passed
23.22 sec
      Start 56: oversetFluidsEdge
68/77 Test Exawind#56: oversetFluidsEdge ......................   Passed
21.07 sec
      Start 54: overset
69/77 Test Exawind#54: overset ................................   Passed
2.16 sec
70/77 Test  Exawind#8: dgMMS ..................................   Passed
51.45 sec
      Start 12: dgNonConformal3dFluidsP1P2
71/77 Test Exawind#12: dgNonConformal3dFluidsP1P2 .............   Passed
294.81 sec
      Start 45: mixedTetPipe
72/77 Test Exawind#45: mixedTetPipe ...........................   Passed
30.65 sec
      Start 14: dgNonConformalEdgeCylinder
73/77 Test Exawind#14: dgNonConformalEdgeCylinder .............   Passed
65.57 sec
      Start  7: cvfemHC
74/77 Test  Exawind#7: cvfemHC ................................   Passed
10.66 sec
      Start 59: periodic3dElemNp8
75/77 Test Exawind#59: periodic3dElemNp8 ......................   Passed
3.86 sec
      Start 62: periodic3dEdgeNp8
76/77 Test Exawind#62: periodic3dEdgeNp8 ......................   Passed
2.54 sec
      Start  2: ablNeutralEdgeSegregated
77/77 Test  Exawind#2: ablNeutralEdgeSegregated ...............   Passed
67.66 sec

100% tests passed, 0 tests failed out of 77

Label Time Summary:
performance     = 14010.66 sec*proc (3 tests)
regression      = 26090.50 sec*proc (68 tests)
unit            =  50.35 sec*proc (2 tests)
verification    = 6920.38 sec*proc (4 tests)

Total Test time (real) = 4722.03 sec
sayerhs added a commit that referenced this pull request Apr 2, 2020
)

* Separate graph and solver types

Add new CrsGraphTypes namespace that LinearSolverTypes pulls in.

* Rename linear system helper file

Rename file holding Crs graph helper functions.
The new name reflects that the functions are for the matrix graph.

* Add new CrsGraph class

This is a step towards removing the matrix graph creation/initialization
that is currently in TpetraLinearSystem.  Currently, CrsGraph duplicates
functionality TpetraLinearSystem.  Everything compiles.

* Move graph from TpetraLinearSystem to CrsGraph

This commit adds a CrsGraph object to TpetraLinearSystem and forwards
graph requests to the CrsGraph class.

NaluWind compiles and regression test "AblNeutralEdge" runs.

* Code clean up

remove code that has been moved to CrsGraph

* Fix unit test compilation

Reenable methods getRowLID and getColLID, which are used in unit
testing.  Change header inclusion.

* CrsGraph: initialize inConstruction flag in ctor

uninitialized bool was causing initialization to be skipped

With this commit, all tests pass:

Test project
/ascldap/users/jhu/exawind/build-naluwind-gcc-7.2.0-openmpi-1.10.1-opt
      Start 75: oversetHybrid
      Start 33: heatedBackStep
 1/77 Test #33: heatedBackStep .........................   Passed
227.44 sec
      Start 38: hoVortex
      Start 63: quad9HC
 2/77 Test #63: quad9HC ................................   Passed
71.32 sec
      Start 66: variableDensNonIso
 3/77 Test #38: hoVortex ...............................   Passed
117.75 sec
      Start 67: variableDensNonUniform
 4/77 Test #67: variableDensNonUniform .................   Passed
77.28 sec
      Start 73: unitTest1
      Start 57: periodic3dElemNp1
 5/77 Test #66: variableDensNonIso .....................   Passed
139.80 sec
      Start 47: nonConformalWithPeriodic
 6/77 Test #57: periodic3dElemNp1 ......................   Passed
18.53 sec
      Start 42: kovasznay_P5
 7/77 Test #73: unitTest1 ..............................   Passed
23.09 sec
      Start 60: periodic3dEdgeNp1
 8/77 Test #60: periodic3dEdgeNp1 ......................   Passed
8.00 sec
      Start 41: karmanVortex
 9/77 Test #42: kovasznay_P5 ...........................   Passed
13.79 sec
10/77 Test #41: karmanVortex ...........................   Passed
1.96 sec
      Start 74: unitTest2
11/77 Test #47: nonConformalWithPeriodic ...............   Passed
18.18 sec
      Start 48: nonConformalWithPeriodicConsolidated
12/77 Test #74: unitTest2 ..............................   Passed
13.63 sec
      Start 31: femHCGL
13/77 Test #48: nonConformalWithPeriodicConsolidated ...   Passed
14.57 sec
      Start 26: elemClosedDomain
14/77 Test #31: femHCGL ................................   Passed
3.85 sec
      Start 30: femHC
15/77 Test #26: elemClosedDomain .......................   Passed
3.95 sec
      Start 19: ductElemWedge
16/77 Test #30: femHC ..................................   Passed
3.90 sec
      Start 20: ductWedge
17/77 Test #19: ductElemWedge ..........................   Passed
2.54 sec
18/77 Test #20: ductWedge ..............................   Passed
1.73 sec
      Start 53: nonIsoNonUniformElemOpenJet
19/77 Test #53: nonIsoNonUniformElemOpenJet ............   Passed
165.31 sec
      Start 43: milestoneRun
20/77 Test #43: milestoneRun ...........................   Passed
115.86 sec
      Start  4: ablUnstableEdge
21/77 Test  #4: ablUnstableEdge ........................   Passed
89.04 sec
      Start 64: steadyTaylorVortex
22/77 Test #64: steadyTaylorVortex .....................   Passed
121.05 sec
      Start 44: milestoneRunConsolidated
23/77 Test #75: oversetHybrid ..........................   Passed
1017.76 sec
      Start 77: waleElemXflowMixFrac3.5m
24/77 Test #44: milestoneRunConsolidated ...............   Passed
104.43 sec
      Start  3: ablStableElem
25/77 Test  #3: ablStableElem ..........................   Passed
87.88 sec
      Start 52: nonIsoNonUniformEdgeOpenJet
26/77 Test #52: nonIsoNonUniformEdgeOpenJet ............   Passed
63.72 sec
      Start 16: dgNonConformalFluids
27/77 Test #16: dgNonConformalFluids ...................   Passed
61.76 sec
      Start 50: nonIsoElemOpenJet
28/77 Test #50: nonIsoElemOpenJet ......................   Passed
61.97 sec
      Start 17: dgNonConformalFluidsEdge
29/77 Test #17: dgNonConformalFluidsEdge ...............   Passed
50.90 sec
      Start 51: nonIsoElemOpenJetConsolidated
30/77 Test #51: nonIsoElemOpenJetConsolidated ..........   Passed
52.57 sec
      Start  5: ablUnstableEdge_ra
31/77 Test  #5: ablUnstableEdge_ra .....................   Passed
37.09 sec
      Start 24: ekmanSpiralConsolidated
32/77 Test #24: ekmanSpiralConsolidated ................   Passed
43.84 sec
      Start 40: inputFireElem
33/77 Test #77: waleElemXflowMixFrac3.5m ...............   Passed
516.90 sec
      Start 37: hoHelium
34/77 Test #40: inputFireElem ..........................   Passed
42.54 sec
      Start 34: heatedWaterChannelEdge
35/77 Test #34: heatedWaterChannelEdge .................   Passed
37.97 sec
      Start 18: dgNonConformalThreeBlade
36/77 Test #18: dgNonConformalThreeBlade ...............   Passed
245.03 sec
      Start 49: nonIsoEdgeOpenJet
37/77 Test #49: nonIsoEdgeOpenJet ......................   Passed
23.74 sec
      Start  6: concentricRad
38/77 Test  #6: concentricRad ..........................   Passed
25.52 sec
      Start 11: dgNonConformal3dFluidsHexTet
39/77 Test #11: dgNonConformal3dFluidsHexTet ...........   Passed
134.40 sec
      Start 29: elemPipeCHT
40/77 Test #29: elemPipeCHT ............................   Passed
18.82 sec
      Start 23: ekmanSpiral
41/77 Test #37: hoHelium ...............................   Passed
539.10 sec
      Start 71: hoVortex_P2
42/77 Test #23: ekmanSpiral ............................   Passed
14.30 sec
      Start 10: dgNonConformal3dFluids
43/77 Test #10: dgNonConformal3dFluids .................   Passed
117.65 sec
      Start 25: elemBackStepLRSST
44/77 Test #25: elemBackStepLRSST ......................   Passed
11.00 sec
      Start 46: movingCylinder
45/77 Test #46: movingCylinder .........................   Passed
12.13 sec
      Start 22: edgePipeCHT
46/77 Test #22: edgePipeCHT ............................   Passed
10.90 sec
      Start 39: inputFireEdgeUpwind
47/77 Test #39: inputFireEdgeUpwind ....................   Passed
7.94 sec
      Start  9: dgNonConformal
48/77 Test  #9: dgNonConformal .........................   Passed
8.07 sec
      Start 58: periodic3dElemNp4
49/77 Test #58: periodic3dElemNp4 ......................   Passed
5.47 sec
      Start 61: periodic3dEdgeNp4
50/77 Test #61: periodic3dEdgeNp4 ......................   Passed
3.82 sec
      Start 35: heatedWaterChannelElem
51/77 Test #35: heatedWaterChannelElem .................   Passed
20.46 sec
      Start 13: dgNonConformalEdge
52/77 Test #13: dgNonConformalEdge .....................   Passed
7.60 sec
53/77 Test #71: hoVortex_P2 ............................   Passed
384.64 sec
      Start 36: heliumPlume
54/77 Test #36: heliumPlume ............................   Passed
226.47 sec
      Start 76: uqSlidingMeshDG
55/77 Test #76: uqSlidingMeshDG ........................   Passed
216.66 sec
      Start 70: cvfemHexHC_P3
56/77 Test #70: cvfemHexHC_P3 ..........................   Passed
235.98 sec
      Start 32: fluidsPmrChtPeriodic
57/77 Test #32: fluidsPmrChtPeriodic ...................   Passed
99.94 sec
      Start 72: steadyTaylorVortex_P4
58/77 Test #72: steadyTaylorVortex_P4 ..................   Passed
154.11 sec
      Start 68: variableDensNonUniform_P5
59/77 Test #68: variableDensNonUniform_P5 ..............   Passed
187.59 sec
      Start  1: ablNeutralEdge
60/77 Test  #1: ablNeutralEdge .........................   Passed
78.58 sec
      Start 27: elemHybridFluids
61/77 Test #27: elemHybridFluids .......................   Passed
94.55 sec
      Start 69: BoussinesqNonIso
62/77 Test #69: BoussinesqNonIso .......................   Passed
90.32 sec
      Start 28: elemHybridFluidsShift
63/77 Test #28: elemHybridFluidsShift ..................   Passed
88.30 sec
      Start 65: tensor_tg_mms
64/77 Test #65: tensor_tg_mms ..........................   Passed
98.70 sec
      Start 15: dgNonConformalElemCylinder
65/77 Test #15: dgNonConformalElemCylinder .............   Passed
103.07 sec
      Start 21: edgeHybridFluids
66/77 Test #21: edgeHybridFluids .......................   Passed
61.73 sec
      Start  8: dgMMS
      Start 55: oversetFluids
67/77 Test #55: oversetFluids ..........................   Passed
23.22 sec
      Start 56: oversetFluidsEdge
68/77 Test #56: oversetFluidsEdge ......................   Passed
21.07 sec
      Start 54: overset
69/77 Test #54: overset ................................   Passed
2.16 sec
70/77 Test  #8: dgMMS ..................................   Passed
51.45 sec
      Start 12: dgNonConformal3dFluidsP1P2
71/77 Test #12: dgNonConformal3dFluidsP1P2 .............   Passed
294.81 sec
      Start 45: mixedTetPipe
72/77 Test #45: mixedTetPipe ...........................   Passed
30.65 sec
      Start 14: dgNonConformalEdgeCylinder
73/77 Test #14: dgNonConformalEdgeCylinder .............   Passed
65.57 sec
      Start  7: cvfemHC
74/77 Test  #7: cvfemHC ................................   Passed
10.66 sec
      Start 59: periodic3dElemNp8
75/77 Test #59: periodic3dElemNp8 ......................   Passed
3.86 sec
      Start 62: periodic3dEdgeNp8
76/77 Test #62: periodic3dEdgeNp8 ......................   Passed
2.54 sec
      Start  2: ablNeutralEdgeSegregated
77/77 Test  #2: ablNeutralEdgeSegregated ...............   Passed
67.66 sec

100% tests passed, 0 tests failed out of 77

Label Time Summary:
performance     = 14010.66 sec*proc (3 tests)
regression      = 26090.50 sec*proc (68 tests)
unit            =  50.35 sec*proc (2 tests)
verification    = 6920.38 sec*proc (4 tests)

Total Test time (real) = 4722.03 sec

* Add header to fix compile error.

* Code cleanup

remove ifdef'd out code

* TpetraSegregatedLinearSystem uses CrsGraph class

Code compiles, ablNeutralEdgeSegregated test passes.

* TpetraSegregatedLinearSystem: code cleanup

remove inactive code protected by macros

* TpetraSegregatedLinearSystem: use Tpetra GIDs

in copy_stk_to_tpetra(), use realm_.tpetGlobalId_ instead of
realm_.naluGlobalId_

1/1 Test #2: ablNeutralEdgeSegregated .........   Passed   68.39 sec

* Fix cmake error from merge

* Add missing beginConstruction

* Add matrix graph reuse between physics phases

* Add forgotten header.  Oops.

* add parser option to rebalance mesh using stk_balance (#471)

* add parser option to rebalance mesh using stk_balance

* Move call to doBalanceNodes after rebalanceMesh

* Add throw when rebalance is called with parmetis but Zoltan2 is not built with parmetis enabled

* Remove unused non-NGP code  (#486)

* Remove unused classes after transition to NGP

* Remove deprecated code used during NGP transition

* Remove OversetManagerSTK in preparation for NGP transition of overset capability (#487)

* Prepare for deprecation of STK-search based overset manager

* Remove deprecated OversetManagerSTK

* Remove deprecated methods related to overset logic

- Remove `LinearSystem::prepareConstraints` and its overrides
- Remove `SolverAlgorithm::apply_coeff` methods that have NGP versions

* Remove the tests from CTestList.cmake also

* Hypre regression tests moved into guarded block

Regression tests using Hypre as one of their solvers need to be guarded with
if(ENABLE_HYPRE)

endif()

* Airfoil tests using Trilinos solver stack, see issue #492 (#494)

* Airfoil tests using Trilinos solver stack, see issue #492

This adds two new tests that use MueLu as preconditioner for the continuity eqs.
The convergence behavior is reasonable and sets things well for future GPU testing of inflow/outflow BCs.

* Fixing input decks to use the segregated solvers for momentum

Fixes #492

* NGP conversion of TpetraLinearSystem::applyDirichletBCs (#493)

* NGP conversion of TpetraLinearSystem::applyDirichletBCs with unit-test.

* Minor fixes to ensure fields are sync'd correctly (host vs device)

* Use NGP_ThrowAssert instead of NGP_ThrowRequire

* Fix accidental revert of the mesh submodule.

* CrsGraph: ensure finalizeGraph called only once

* CrsGraph: initialize variable in ctor

* Updates for CrsGraph

These commits add the calls buildElemToNodeGraph and
buildFaceElemToNodeGraph to various "initalize_connectivity"
methods.  This effectively creates a superset of graph connectivity
for different physics.

Also added a check to avoid calling CrsGraph::finalizeGraph multiple
times (suggested by RK).

Currently, these tests are still failing:

      1 - ablNeutralEdge (Failed)
      4 - ablUnstableEdge (Failed)
      5 - ablUnstableEdge_ra (Failed)
      6 - airfoilRANSEdgeTrilinos (Failed)
     22 - ductWedge (Failed)
     23 - edgeHybridFluids (Failed)
     24 - edgePipeCHT (Failed)
     43 - karmanVortex (Failed)
     50 - nonIsoEdgeOpenJet (Failed)
     53 - nonIsoNonUniformEdgeOpenJet (Failed)

* Prepare equation systems for common graph

* Revert "Updates for CrsGraph"

This reverts commit 880555d.

* Modify EquationSystems to use a common graph for Tpetra

* Common graph initialization refactor

- Only initialize graphs if they are required by TpetraLinearSystems
- Make sure both hypre and tpetra pathways work correctly

* Add timers to track init costs missing from EquationSystem timers

* Consolidate buildNodeGraph

* Propagate changes to TpetraSegregatedLinearSystem

* Fix regression test failures

Remove call to finalizeLinearSystem, as was done for other physics

* Update unit test helpers

Avoids referencing uninitialized maps.  However, some
tests are still not passing.

* Improve unit-testing for sparse-matrix values. (#544)

* Improve unit-testing for sparse-matrix values.

It used to just compare each matrix column and coefficient to
the expected gold values, expecting position to match as well.
Now it searchs the matrix row for each gold column and checks
if the coefficient matches, and doesn't fail if there are extra
columns in the matrix.

* Add a check to make sure non-gold columns have coeffs of zero.

* Fixing build error after updating Trilinos

* NaluWind: fix nonlinear residual

Make the nonlinear residual convergence tolerance for momentum and continuity
consistent with the tolerance set by the user.

* NaluWind: fix compile errors

* RegTests: update linear solver input deck

Tpetra matrices now potentially store hard zeros, due to refactoring
to use pre-computed CrsGraph's.  The test fluidsPmrChtPeriodic
is sensitive to these zeros, so I've created a new MueLu input deck
that drops the hard zeros during multigrid coarsening.

* RegTests: new MueLu settings for fluidsPmrChtPeriodic

Add aggregation dropping threshold to remove hard zeros due to
CrsGraph refactor.

* Remove commented out code.

* Fix inheritance of nonlinear tolerances

This is a much better fix, suggested by Shreyas.

Co-authored-by: Johnathan Vo <jvo1012@users.noreply.github.com>
Co-authored-by: Shreyas Ananthan <shreyas@umd.edu>
Co-authored-by: Luc Berger-Vergiat <lberge@sandia.gov>
Co-authored-by: Alan Williams <william@sandia.gov>
Co-authored-by: Jon Rood <jon.rood@nrel.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants