Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Satellite phenology mode run failure with gnu compiler #937

Closed
glemieux opened this issue Nov 11, 2022 · 4 comments
Closed

Satellite phenology mode run failure with gnu compiler #937

glemieux opened this issue Nov 11, 2022 · 4 comments

Comments

@glemieux
Copy link
Contributor

While trying to track down an exact restart issue on perlmutter using elm-fates, I found that satellite phenology mode failure the run immediately with the following stack trace:

32: Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
32:
32: Backtrace for this error:
 0:  ERROR: Unknown error submitted to shr_abort_abort.
32: #0  0x1520bc66ad6f in ???
 0: #0  0xc869ca in __shr_abort_mod_MOD_shr_abort_backtrace
 0:     at /global/u1/g/glemieux/e3sm-test/share/util/shr_abort_mod.F90:104
 0: #1  0xc86b9c in __shr_abort_mod_MOD_shr_abort_abort
 0:     at /global/u1/g/glemieux/e3sm-test/share/util/shr_abort_mod.F90:61
32: #1  0x9f0531 in __damagemainmod_MOD_getcrownreduction
32:     at /global/u1/g/glemieux/e3sm-test/components/elm/src/external_models/fates/biogeochem/DamageMainMod.F90:176
 0: #2  0xa0af1c in __fatesallometrymod_MOD_tree_lai
 0:     at /global/u1/g/glemieux/e3sm-test/components/elm/src/external_models/fates/biogeochem/FatesAllometryMod.F90:650
32: #2  0xa09c7c in carea_2pwr
32:     at /global/u1/g/glemieux/e3sm-test/components/elm/src/external_models/fates/biogeochem/FatesAllometryMod.F90:2130
32: #3  0xa0b49a in __fatesallometrymod_MOD_carea_allom
32:     at /global/u1/g/glemieux/e3sm-test/components/elm/src/external_models/fates/biogeochem/FatesAllometryMod.F90:509
 0: #3  0xa060a1 in __edphysiologymod_MOD_assign_cohort_sp_properties
 0:     at /global/u1/g/glemieux/e3sm-test/components/elm/src/external_models/fates/biogeochem/EDPhysiologyMod.F90:1695
32: #4  0xa05fad in __edphysiologymod_MOD_assign_cohort_sp_properties
32:     at /global/u1/g/glemieux/e3sm-test/components/elm/src/external_models/fates/biogeochem/EDPhysiologyMod.F90:1665
 0: #4  0x950f24 in init_cohorts
 0:     at /global/u1/g/glemieux/e3sm-test/components/elm/src/external_models/fates/main/EDInitMod.F90:818
 0: #5  0x951756 in __edinitmod_MOD_init_patches
 0:     at /global/u1/g/glemieux/e3sm-test/components/elm/src/external_models/fates/main/EDInitMod.F90:635
32: #5  0x950f24 in init_cohorts
32:     at /global/u1/g/glemieux/e3sm-test/components/elm/src/external_models/fates/main/EDInitMod.F90:818
32: #6  0x951756 in __edinitmod_MOD_init_patches
32:     at /global/u1/g/glemieux/e3sm-test/components/elm/src/external_models/fates/main/EDInitMod.F90:635
32: #7  0x501cb3 in __elmfatesinterfacemod_MOD_init_coldstart
32:     at /global/u1/g/glemieux/e3sm-test/components/elm/src/main/elmfates_interfaceMod.F90:1744
 0: #6  0x501cb3 in __elmfatesinterfacemod_MOD_init_coldstart
 0:     at /global/u1/g/glemieux/e3sm-test/components/elm/src/main/elmfates_interfaceMod.F90:1744
32: #8  0x4d2c63 in __elm_initializemod_MOD_initialize2
32:     at /global/u1/g/glemieux/e3sm-test/components/elm/src/main/elm_initializeMod.F90:991
 0: #7  0x4d2c63 in __elm_initializemod_MOD_initialize2

Looking at the failure points, it appears that it is likely due to an compiler-specific initialization error with crown_damage when SP mode is on based on this section:

fates/main/EDInitMod.F90

Lines 813 to 825 in 8ef6a1e

if(hlm_use_sp.eq.itrue)then
init = itrue
! At this point, we do not know the bc_in values of tlai tsai and htop,
! so this is initializing to an arbitrary value for the very first timestep.
! Not sure if there's a way around this or not.
call assign_cohort_SP_properties(temp_cohort, 0.5_r8,0.2_r8, 0.1_r8,patch_in%area,init,c_leaf)
else
temp_cohort%hite = EDPftvarcon_inst%hgt_min(pft)
! Assume no damage to begin with - since we assume no damage
! we do not need to initialise branch frac just yet.
temp_cohort%crowndamage = 1

crown_damage is used in a call to carea_allom via assign_cohort_SP_properties. As such, moving the setting in line 825 up prior to the sp mode check should alleviate the issue.

@glemieux
Copy link
Contributor Author

Running the proposed fix on perlmutter results in a successful sp mode run.

@glemieux glemieux moved this from ❕Todo to 🟢 In Progress in FATES issue board Nov 11, 2022
@glemieux
Copy link
Contributor Author

glemieux commented Nov 11, 2022

I'm going to try and replicate this failure on cheyenne using the gnu compiler as well.

@glemieux
Copy link
Contributor Author

glemieux commented Nov 11, 2022

I'm going to try and replicate this failure on cheyenne using the gnu compiler as well.

Interestingly this runs just fine using the gnu compiler on Cheyenne. Based on that, I'm guessing it has to do with compiler option differences.

I realized I forgot to include the I2000Clm51FatesSpRsGs compset necessary for the satellite phenology testmod. This is immediately failing as expected.

@glemieux
Copy link
Contributor Author

Closed per #939

Repository owner moved this from 🟢 In Progress to ✔ Done in FATES issue board Nov 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

1 participant