Skip to content

Performance FMT

Peter Hjort Lauritzen edited this page Nov 13, 2023 · 17 revisions

new

se_nsplit = 1
se_rsplit = 6
se_qsplit = 1

se_hypervis_subcycle = 10
se_nu_div = 1E15
se_nu = 1E15
se_sponge_del4_nu_div_fac  = 10
se_sponge_del4_nu_fac  = 5
se_sponge_del4_lev = 3
se_hypervis_subcycle_sponge = 3

stable for 180 days!

se_nsplit = 2
se_rsplit = 3
se_qsplit = 1
se_hypervis_subcycle = 3
se_nu_div = 1E15
se_nu = 1E15
se_sponge_del4_nu_div_fac  = 2.5
se_sponge_del4_nu_fac  = 1
se_sponge_del4_lev = 45
se_hypervis_subcycle_sponge = 3

unstable

se_nsplit = 2
se_rsplit = 3
se_qsplit = 1
se_hypervis_subcycle = 15
se_nu_div = 1E15
se_nu = 1E15
se_sponge_del4_nu_div_fac  = 15
se_sponge_del4_nu_fac  = 5
se_sponge_del4_lev = 10
se_hypervis_subcycle_sponge = 3

unstable (after 40 days)

se_nsplit = 2
se_rsplit = 3
se_qsplit = 1
se_hypervis_subcycle = 3
se_nu_div = 1E15
se_nu = 1E15
se_sponge_del4_nu_div_fac  = 3
se_sponge_del4_nu_fac  = 3
se_sponge_del4_lev = 10
se_hypervis_subcycle_sponge = 3

unstable

se_nsplit = 2
se_rsplit = 3
se_qsplit = 1
se_hypervis_subcycle = 7
se_nu_div = 1E15
se_nu = 1E15
se_sponge_del4_nu_div_fac  = 7.5
se_sponge_del4_nu_fac  = 5
se_sponge_del4_lev = 10
se_hypervis_subcycle_sponge = 3

unstable

Current default namelist setting for ~80km top model (FMT):

 se_hypervis_subcycle         = 3
 se_hypervis_subcycle_q       = 1                 
 se_hypervis_subcycle_sponge  = 1  
 se_nsplit                    = 2 
 se_nu                        = 1E15
 se_nu_div                    = 2.5E15 
 se_nu_p                      = 1E15
 se_nu_top                    = 1.0e6 
 se_qsplit                    = 1  
 se_rsplit                    = 6 
 se_sponge_del4_lev           = 3 
 se_sponge_del4_nu_div_fac    = 5 
 se_sponge_del4_nu_fac        = 7.5 

These settings are scaled for max wind of 400m/s (also used for WACCM)

With HB diffusion (when CLUBB is not active) the max winds are 200m/s and there is no need for increased del4 viscosity in sponge. Hence we can optimize time-stepping signficantly:

se_nsplit = 2
se_rsplit = 5
se_qsplit = 1
se_hypervis_subcycle = 1
se_nu_div = 1E15
se_nu = 1E15
se_sponge_del4_nu_div_fac  = 1
se_sponge_del4_nu_fac  = 1
se_sponge_del4_lev = 1

Unstable:

se_nsplit = 2
se_rsplit = 3
se_qsplit = 1
se_hypervis_subcycle = 3
se_nu_div = 1E15
se_nu = 1E15
se_sponge_del4_nu_div_fac  = 3
se_sponge_del4_nu_fac  = 3
se_sponge_del4_lev = 3
se_hypervis_subcycle_sponge = 3

1 month averaged OMEGA500

Out-of-the box:

Screen Shot 2023-07-20 at 2 59 40 PM

opt1: div4=1E15 (instead of 2.5E15)

 se_hypervis_subcycle          = 2
 se_hypervis_subcycle_q       = 1                 
 se_hypervis_subcycle_sponge  =  1  
 se_nsplit                    = 2
 se_nu                        = 1E15
 se_nu_div                    = 1E15
 se_nu_top                    = 1.0e6 
 se_qsplit                    = 1
 se_rsplit                    = 9
 se_sponge_del4_lev           = 3 
 se_sponge_del4_nu_div_fac    = 5 
 se_sponge_del4_nu_fac        = 7.5 
Screen Shot 2023-07-20 at 3 02 29 PM

Performance difference (~23% faster):

Screen Shot 2023-07-20 at 3 04 16 PM

opt2: no increased div4 and no increased del4 in sponge

se_nsplit = 2
se_rsplit = 5
se_qsplit = 1
se_hypervis_subcycle = 1
se_nu_div = 1E15
se_nu = 1E15
se_sponge_del4_nu_div_fac  = 1
se_sponge_del4_nu_fac  = 1
se_sponge_del4_lev = 1
Screen Shot 2023-07-20 at 3 35 48 PM

Opt 2 is ~19% faster than opt1:

Screen Shot 2023-07-20 at 3 39 49 PM

Opt2 is ~43% faster than ref:

Screen Shot 2023-07-20 at 3 43 16 PM

Conclusion: hyperviscosity operator is very expensive

opt3: opt2+no double eadvection of condensates

Screen Shot 2023-07-21 at 9 55 02 AM

Computational performance (~10% faster than opt3 and 2x faster than ref):

Screen Shot 2023-07-21 at 9 57 14 AM Screen Shot 2023-07-21 at 9 58 45 AM

Note: p_d_coupling takes 160s. All of CSLAM advection takes 539s. Seems way too long ...

Note: spectral-element advection even with one tracer is quite expensive; try not to advect spectral-element tracers at all (try moist baroclinic wave to see how well balance is maintained)