Dynamic Patch Arrays - larger nclmax and nlevleaf #1198

rgknox · 2024-05-10T17:33:34Z

Description:

This set of changes makes a bunch of arrays attached to the patch structure dynamically allocated. Most of these arrays are dimensioned by: number-of-canopy-layers x number-of-pfts x number-of-veg-layers.

Previously, in the interest of keeping these arrays small, we had elected to keep nclmax equal to 2 and nlevleaf = 30, and manually increase this value when necessary. We still have these two constants, but they are used to either allocate stack space, or smaller dimension arrays, so we can bump them up to numbers that won't affect peoples runs.

So now, nclmax = 5 and nlevleaf = 50. We could bump them up higher if need be, I'm keeping them low because they still affect the size of history output arrays. But, users can set the history dimensionality to level 1 to get rid of large arrays in their output anyway.

Collaborators:

This has been a refactor target for years, so lots of people have probably weighed in.

Expectation of Answer Changes:

No answer changes but... the canopy structure part of the code is super sensitive to order of operation changes, so I do actually expect very small diffs.

Checklist

If this is your first time contributing, please read the CONTRIBUTING document.

All checklist items must be checked to enable merging this pull request:

Contributor

The in-code documentation has been updated with descriptive comments
The documentation has been assessed to determine if updates are necessary

Integrator

FATES PASS/FAIL regression tests were run
Evaluation of test results for answer changes was performed and results provided

Documentation

Technical Note update:
User's Guide update:

Test Results:

CTSM (or) E3SM (specify which) test hash-tag:

CTSM (or) E3SM (specify which) baseline hash-tag:

FATES baseline hash-tag:

Test Output:

rgknox · 2024-05-10T17:50:08Z

There are a few patch level variables that have weird names I'd also like to change:

patch%ncl_p : The _p part indicates patch, but that is implied by being on the patch structure, I'd like to change this so "ncan"

patch%ncan(:,:) : This is not the number of canopy layers! Its the number of vegetation layers in each canopy by pft class! I'd like to change this to "nveg(:,:)

patch%nrad(:,:) : This should be removed, we don't use it!

biogeochem/EDCanopyStructureMod.F90

…e updated in that spot

…geset

mpaiao

Many thanks for addressing this @rgknox, I went through your changes and they all look good! I am looking forward to testing this new code and see if it will help sustaining a denser understory in FATES.

mpaiao · 2024-05-10T18:14:48Z

biogeochem/EDCanopyStructureMod.F90

+       else
+          currentPatch%NCL_p = z
+       end if
+


It is nice to have the informative error message, but I wonder if error as opposed to cohort termination is something we always want to do from now on. My only concern is if this could trigger too many error messages in global runs or parameter sensitivity experiments.

But we can see if this becomes a problem and address if needed.

I'm open to allowing termination here. I suppose it will allow the model to continue working when people are testing strange edge-case parameter combinations that generate more than 5 canopy layers.

@mpaiao , I looked at this again. The calls to termination that precede this section should be ensuring that z <= nclmax. If z is larger than nclmax it should be in error at this point.

biogeophys/FatesPlantRespPhotosynthMod.F90

…hMod

rgknox · 2024-06-10T17:15:49Z

I compared the timing output for simulations at BCI that use this method and the old method using a cap of 3 canopy layers. The simulations had no noticeable difference in run time.

glemieux

I think this looks good. The (re)allocation code was pretty straight-foward and well commented. I only had one question below about zeroing some of the dynamics values.

biogeochem/FatesPatchMod.F90

Bug fix for non-land use run modes

glemieux · 2024-07-19T16:51:19Z

Regression testing underway on derecho

glemieux · 2024-07-19T18:52:46Z

Regression testing against fates-sci.1.77.1_api.36.0.0-ctsm5.2.013 is showing two RUN failures on derecho:

FAIL ERS_D_Ld30.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdPRT2 RUN time=60
FAIL ERS_P128x1_Lm25.f10_f10_mg37.I2000Clm60Fates.derecho_intel.clm-FatesColdNoComp RUN time=573

The FatesColdPRT2 test is failing with the following error per the cesm.log:

 20 dec0207.hsn.de.hpc.ucar.edu 378: forrtl: severe (408): fort: (7): Attempt to use pointer CPATCH when it is not associated with a target
 21 dec0207.hsn.de.hpc.ucar.edu 378:
 22 dec0207.hsn.de.hpc.ucar.edu 378: Image              PC                Routine            Line        Source
 23 dec0207.hsn.de.hpc.ucar.edu 378: cesm.exe           00000000018DFCBA  edphysiologymod_m        3323  EDPhysiologyMod.F90
 24 dec0207.hsn.de.hpc.ucar.edu 378: cesm.exe           00000000017BF521  edmainmod_mp_ed_i         692  EDMainMod.F90
 25 dec0207.hsn.de.hpc.ucar.edu 378: cesm.exe           00000000017B9DC2  edmainmod_mp_ed_e         220  EDMainMod.F90
 26 dec0207.hsn.de.hpc.ucar.edu 378: cesm.exe           0000000000B04EA2  clmfatesinterface        1259  clmfates_interfaceMod.F90
 27 dec0207.hsn.de.hpc.ucar.edu 378: cesm.exe           0000000000A7B8B5  clm_driver_mp_clm        1142  clm_driver.F90
 28 dec0207.hsn.de.hpc.ucar.edu 378: cesm.exe           000000000099F3B7  lnd_comp_nuopc_mp         904  lnd_comp_nuopc.F90

The FatesColdNoComp test cesm.log stack trace is a little more obscure and doesn't point directly to a potion of the fates code:

2370077 dec0379.hsn.de.hpc.ucar.edu 120: cesm.exe           000000000109E3CD  shr_abort_mod_mp_         114  shr_abort_mod.F90
2370078 dec0379.hsn.de.hpc.ucar.edu 120: cesm.exe           00000000005E94B1  abortutils_mp_end          98  abortutils.F90
2370079 dec0379.hsn.de.hpc.ucar.edu 120: cesm.exe           0000000000E29168  ch4mod_mp_ch4_tra        4186  ch4Mod.F90
2370080 dec0379.hsn.de.hpc.ucar.edu 120: cesm.exe           0000000000E1A6E6  ch4mod_mp_ch4_           2094  ch4Mod.F90
2370081 dec0379.hsn.de.hpc.ucar.edu 120: cesm.exe           00000000005F8DE8  clm_driver_mp_clm        1203  clm_driver.F90
2370082 dec0379.hsn.de.hpc.ucar.edu 120: cesm.exe           000000000059E67E  lnd_comp_nuopc_mp         904  lnd_comp_nuopc.F90

The lnd.log file doesn't provide much context aside from the fact that it failed about 2/3 of the way through the initial case.

Test results can be found: /glade/derecho/scratch/glemieux/ctsm-tests/tests_pr1198

@rgknox there are DIFFs, which I think are as expected, although I only spot checked a few. That said, I'm going to rerun the baseline to make sure that the one I was comparing against was generated correctly.

glemieux · 2024-07-19T21:33:46Z

@rgknox there are DIFFs, which I think are as expected, although I only spot checked a few. That said, I'm going to rerun the baseline to make sure that the one I was comparing against was generated correctly.

It looks like the old baseline I had tested against was not with the latest tags. I've moved that one and regenerated fates-sci.1.77.1_api.36.0.0-ctsm5.2.013 with the appropriate tags checked out. I've got fates suite tests rerunning against the newly generated baseline.

glemieux · 2024-07-19T23:30:14Z

I realized I had missed in the FatesColdNoComp failing test that there was a more illuminating error message:

26413 dec0379.hsn.de.hpc.ucar.edu 120:  energy balance in canopy           24 , err= -0.973966808913889
26414 dec0379.hsn.de.hpc.ucar.edu 120:  Negative conc. in ch4tran. c,j,deficit (mol):           2           4
26415 dec0379.hsn.de.hpc.ucar.edu 120:   1.119271163016430E-003
26416 dec0379.hsn.de.hpc.ucar.edu 120:  Negative conc. in ch4tran. c,j,deficit (mol):           2           5
26417 dec0379.hsn.de.hpc.ucar.edu 120:   3.536220161397217E-003
26418 dec0379.hsn.de.hpc.ucar.edu 120:  Negative conc. in ch4tran. c,j,deficit (mol):           2           6
26419 dec0379.hsn.de.hpc.ucar.edu 120:   7.176459108002257E-003
26420 dec0379.hsn.de.hpc.ucar.edu 120:  Note: sink > source in ch4_tran, sources are changing  quickly relative to diff
26421 dec0379.hsn.de.hpc.ucar.edu 120:  usion timestep, and/or diffusion is rapid.
26422 dec0379.hsn.de.hpc.ucar.edu 120:  Latdeg,Londeg=   80.0000000000000        285.000000000000
26423 dec0379.hsn.de.hpc.ucar.edu 120:  This typically occurs when there is a larger than normal  diffusive flux.
26424 dec0379.hsn.de.hpc.ucar.edu 120:  If this occurs frequently, consider reducing land model (or  methane model) tim
26425 dec0379.hsn.de.hpc.ucar.edu 120:  estep, or reducing the max. sink per timestep in the methane model.
26426 dec0379.hsn.de.hpc.ucar.edu 120:  Negative conc. in ch4tran. c,j,deficit (mol):           2           7
26427 dec0379.hsn.de.hpc.ucar.edu 120:   1.185680703304796E-002
26428 dec0379.hsn.de.hpc.ucar.edu 120:  Negative conc. in ch4tran. c,j,deficit (mol):           2           8
26429 dec0379.hsn.de.hpc.ucar.edu 120:   8.928985367626268E-003
26430 dec0379.hsn.de.hpc.ucar.edu 120:  CH4 Conservation Error in CH4Mod during diffusion, nstep, c, errch4 (mol /m^2.t
26431 dec0379.hsn.de.hpc.ucar.edu 120:  imestep)       25298           2 -3.813536517318042E-002
26432 dec0379.hsn.de.hpc.ucar.edu 120:  Latdeg,Londeg=   80.0000000000000        285.000000000000
26433 dec0379.hsn.de.hpc.ucar.edu 120: iam = 120: local  column   index = 2
26434 dec0379.hsn.de.hpc.ucar.edu 120: iam = 120: global column   index = 1730
26435 dec0379.hsn.de.hpc.ucar.edu 120: iam = 120: global landunit index = 568
26436 dec0379.hsn.de.hpc.ucar.edu 120: iam = 120: global gridcell index = 249
26437 dec0379.hsn.de.hpc.ucar.edu 120: iam = 120: gridcell longitude    =  285.0000000
26438 dec0379.hsn.de.hpc.ucar.edu 120: iam = 120: gridcell latitude     =   80.0000000
26439 dec0379.hsn.de.hpc.ucar.edu 120: iam = 120: column   type         = 1
26440 dec0379.hsn.de.hpc.ucar.edu 120: iam = 120: landunit type         = 1
26441 dec0379.hsn.de.hpc.ucar.edu 120:  ENDRUN:
26442 dec0379.hsn.de.hpc.ucar.edu 120:  ERROR:
26443 dec0379.hsn.de.hpc.ucar.edu 120:   ERROR: CH4 Conservation Error in CH4Mod during diffusionERROR in ch4Mod.F90 at
26444 dec0379.hsn.de.hpc.ucar.edu 120:   line 4188
26445 dec0379.hsn.de.hpc.ucar.edu 120: Image              PC                Routine            Line        Source
26446 dec0379.hsn.de.hpc.ucar.edu 120: cesm.exe           000000000109E3CD  shr_abort_mod_mp_         114  shr_abort_mod.F90
26447 dec0379.hsn.de.hpc.ucar.edu 120: cesm.exe           00000000005E94B1  abortutils_mp_end          98  abortutils.F90
26448 dec0379.hsn.de.hpc.ucar.edu 120: cesm.exe           0000000000E29168  ch4mod_mp_ch4_tra        4186  ch4Mod.F90
26449 dec0379.hsn.de.hpc.ucar.edu 120: cesm.exe           0000000000E1A6E6  ch4mod_mp_ch4_           2094  ch4Mod.F90
26450 dec0379.hsn.de.hpc.ucar.edu 120: cesm.exe           00000000005F8DE8  clm_driver_mp_clm        1203  clm_driver.F90

glemieux · 2024-07-19T23:59:19Z

It looks like the old baseline I had tested against was not with the latest tags. I've moved that one and regenerated fates-sci.1.77.1_api.36.0.0-ctsm5.2.013 with the appropriate tags checked out. I've got fates suite tests rerunning against the newly generated baseline.

The updated run with corrected baseline comparison can be found here: /glade/derecho/scratch/glemieux/ctsm-tests/tests_0719-153219de.

The number of DIFFs has reduced, although it's a little difficult to discern at first as there are 45 DIFFs reported that only have to do with the dimensions differing, which is as expected.

The list of cprnc.out files with DIFFs that actually have non-zero differences can be found in the rms.out file at the top of the test directory. There are 17 files.

rgknox · 2024-07-20T15:08:27Z

@glemieux , are there still run fails?

glemieux · 2024-07-20T18:10:43Z

@rgknox yes I'm still seeing the same failures as noted here: #1198 (comment)

glemieux · 2024-07-22T23:32:24Z

Retesting with nclmax = 2 to check if this impacts the failing tests.

…clmax

glemieux · 2024-07-23T16:09:24Z

Regression testing the fates suite with nclmax = 2 results in the ERS_P128x1_Lm25.f10_f10_mg37.I2000Clm60Fates.derecho_intel.clm-FatesColdNoComp test passing run now. The PRT2 test still fails.

rgknox · 2024-07-24T18:04:25Z

Still working out why there is a failure in the ERS_P128x1_Lm25.f10_f10_mg37.I2000Clm60Fates.XX.FatesColdNocomp tests.

I've tried making it a debug test and running on gnu. The model failed at different places, the common thread seems to be related to heat/energy/temperature. This makes me suspect something isnt getting zero'd vis-a-vis radiation arrays...

I also tried removing the NoComp specification, ERS_D_P128x1_Lm25.f10_f10_mg37.I2000Clm60Fates.derecho_gnu.clm-FatesCold does pass...

rgknox · 2024-07-25T19:38:12Z

@mpaiao @glemieux and other reviewers:

ERS_P128x1_Lm25.f10_f10_mg37.I2000Clm60Fates.XX.FatesColdNocomp fails with main when I up the nclmax to 3. So I don't believe the problem is with this pull request, main "should" pass this test when nclmax = 3.

I propose setting nclmax = 2 in this pull request, integrating following a re-do of tests, and then creating an issue. I'm happy to prioritize the issue.

…=3 in main

rgknox · 2024-08-01T15:30:28Z

tests look good, b4b with base: /glade/derecho/scratch/rgknox/tests_0731-094347de
Exception is a new test does not match base: PVT_Lm3.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesLUPFT, will form an issue about this.

rgknox · 2024-08-14T16:54:36Z

Tests look good except for the PVT baseline, which is not passing for other tests as well.

/glade/derecho/scratch/rgknox/tests_0813-195403de

rgknox added 2 commits May 10, 2024 11:01

removing dependency on small nclmax

e1fb819

final pass at the first version of dynamic patch arrays

5fb8e97

rgknox changed the title ~~Dynamic Patch Arrays - larger nclmax~~ Dynamic Patch Arrays - larger nclmax and nlevleaf May 10, 2024

rgknox requested a review from mpaiao May 10, 2024 17:33

rgknox added the draft label May 10, 2024

Added ncl_p protections

aa370ba

rgknox commented May 10, 2024

View reviewed changes

biogeochem/EDCanopyStructureMod.F90 Show resolved Hide resolved

rgknox added 2 commits May 10, 2024 14:03

Reverted some changes where we update ncl_p, but it doesn't need to b…

d9f25a0

…e updated in that spot

Revert some removed b4b code, will re-remove later under its own chan…

6dc21d1

…geset

mpaiao approved these changes May 10, 2024

View reviewed changes

rgknox removed the draft label May 13, 2024

rgknox added 5 commits May 16, 2024 12:55

nclmax nlevleaf: reverting for b4b testing

556edd4

added memory cleanup to new patch allocated arrays

b4ab283

Moved patch dynamic reallocations, zeroing and nanning into FatesPatc…

40e67cf

…hMod

Merge branch 'main' into remove_nclmax

e4576eb

Updated nclmax and nlevleaf

256dd62

glemieux assigned rgknox and glemieux and unassigned glemieux Jun 3, 2024

glemieux self-requested a review June 17, 2024 18:45

glemieux approved these changes Jun 21, 2024

View reviewed changes

biogeochem/FatesPatchMod.F90 Show resolved Hide resolved

rgknox and others added 4 commits July 9, 2024 13:56

Merge branch 'main' into remove_nclmax-merged

ea19532

changed ncan to nleaf

a2b3ead

fix on recruit l2fr

bcec046

Merge tag 'sci.1.77.1_api.36.0.0' into remove_nclmax

85c6821

Bug fix for non-land use run modes

Merge branch 'remove_nclmax' of github.com:rgknox/fates into remove_n…

fc20a61

…clmax

reverting nclmax to 3 for tests

d71c9f0

More zeroing of dynamic arrays

cccfac6

rgknox added 7 commits July 26, 2024 11:02

resetting nclmax to 2, since there is a bug that is preventing nclmax…

248595c

…=3 in main

attempts to get b4b, unsuccesfull so far

0897ba1

zeroing psn_z

d914891

remove canopy closure clause

ed62e82

fixes to summed lai

9b6b336

small additions to patch arrays

8da0669

Merge branch 'main' into remove_nclmax

ae32bd4

rgknox mentioned this pull request Aug 2, 2024

3 month PVT FatesLUPFT test is not passing baseline comparisons on PR 1198 #1233

Open

rgknox merged commit b469786 into NGEET:main Aug 14, 2024
1 check was pending

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic Patch Arrays - larger nclmax and nlevleaf #1198

Dynamic Patch Arrays - larger nclmax and nlevleaf #1198

rgknox commented May 10, 2024 •

edited

Loading

rgknox commented May 10, 2024 •

edited

Loading

mpaiao left a comment

mpaiao May 10, 2024

rgknox May 13, 2024

rgknox Jul 10, 2024

rgknox commented Jun 10, 2024

glemieux left a comment

glemieux commented Jul 19, 2024

glemieux commented Jul 19, 2024 •

edited

Loading

glemieux commented Jul 19, 2024 •

edited

Loading

glemieux commented Jul 19, 2024 •

edited

Loading

glemieux commented Jul 19, 2024

rgknox commented Jul 20, 2024

glemieux commented Jul 20, 2024

glemieux commented Jul 22, 2024

glemieux commented Jul 23, 2024

rgknox commented Jul 24, 2024

rgknox commented Jul 25, 2024 •

edited

Loading

rgknox commented Aug 1, 2024 •

edited

Loading

rgknox commented Aug 14, 2024

Dynamic Patch Arrays - larger nclmax and nlevleaf #1198

Dynamic Patch Arrays - larger nclmax and nlevleaf #1198

Conversation

rgknox commented May 10, 2024 • edited Loading

Description:

Collaborators:

Expectation of Answer Changes:

Checklist

Documentation

Test Results:

rgknox commented May 10, 2024 • edited Loading

mpaiao left a comment

Choose a reason for hiding this comment

mpaiao May 10, 2024

Choose a reason for hiding this comment

rgknox May 13, 2024

Choose a reason for hiding this comment

rgknox Jul 10, 2024

Choose a reason for hiding this comment

rgknox commented Jun 10, 2024

glemieux left a comment

Choose a reason for hiding this comment

glemieux commented Jul 19, 2024

glemieux commented Jul 19, 2024 • edited Loading

glemieux commented Jul 19, 2024 • edited Loading

glemieux commented Jul 19, 2024 • edited Loading

glemieux commented Jul 19, 2024

rgknox commented Jul 20, 2024

glemieux commented Jul 20, 2024

glemieux commented Jul 22, 2024

glemieux commented Jul 23, 2024

rgknox commented Jul 24, 2024

rgknox commented Jul 25, 2024 • edited Loading

rgknox commented Aug 1, 2024 • edited Loading

rgknox commented Aug 14, 2024

rgknox commented May 10, 2024 •

edited

Loading

rgknox commented May 10, 2024 •

edited

Loading

glemieux commented Jul 19, 2024 •

edited

Loading

glemieux commented Jul 19, 2024 •

edited

Loading

glemieux commented Jul 19, 2024 •

edited

Loading

rgknox commented Jul 25, 2024 •

edited

Loading

rgknox commented Aug 1, 2024 •

edited

Loading