Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-fire_emis NoAnthro tests fail because surface dataset is 16pft rather than 78 to match the new fire-emis file #2759

Closed
ekluzek opened this issue Sep 14, 2024 · 5 comments
Assignees
Labels
bug something is working incorrectly

Comments

@ekluzek
Copy link
Collaborator

ekluzek commented Sep 14, 2024

Brief summary of bug

SMS_D_Ld3_PS.f09_g17.I1850Clm60SpNoAnthro.derecho_intel.clm-decStart1851_noinitial

test fails due to a glitch in fire-emissions which were turned on for in the ctsm5.3.0 prototype.

General bug information

CTSM version you are using: branch_tags/ctsm5.3.n03_ctsm5.2.028-20-g317dc11d0

Does this bug cause significantly incorrect results in the model's science? No

Configurations affected:

Some Sp simulations with -fire-emis on and the fire_emission_factors_78PFTs_c20240624.nc file

I don't see what's different about this test from the ones that pass

Here's the list of Sp tests with fire_emis on that pass:

ERP_D_Ld3_PS.f09_g17.I2000Clm50Sp.derecho_intel.clm-prescribed
ERP_D_Ld5.f10_f10_mg37.I2000Clm60Sp.derecho_intel.clm-decStart
ERP_D_Ld5.f10_f10_mg37.IHistClm45Sp.derecho_intel.clm-decStart
ERP_D_Ld5.f10_f10_mg37.IHistClm50SpCru.derecho_gnu.clm-drydepnomegan
ERP_D_Ld5.f10_f10_mg37.IHistClm60Sp.derecho_intel.clm-default
ERP_D_Ld5.ne30pg3_t232.IHistClm51Sp.derecho_intel.clm-default
ERP_P64x2_D.f10_f10_mg37.I2000Clm50SpRtmFl.derecho_intel.clm-default
ERP_P64x2_D_Ld10.f10_f10_mg37.IHistClm50SpG.derecho_intel.clm-glcMEC_decrease
ERP_P64x2_D_Ld5.f10_f10_mg37.I2000Clm45Sp.derecho_intel.clm-default
ERP_P64x2_D_Ld5.f10_f10_mg37.I2000Clm50Sp.derecho_gnu.clm-default
ERS_D_Ld10.f10_f10_mg37.IHistClm50Sp.derecho_intel.clm-collapse_pfts_78_to_16_decStart_f10
NCK_Ld1.f10_f10_mg37.I2000Clm50Sp.derecho_intel.clm-default
SMS_D_Ld1_Mmpi-serial.f45_f45_mg37.I2000Clm50SpRs.derecho_intel.clm-ptsRLA
SMS_D_Ld1_PS.f09_g17.I1850Clm50Sp.derecho_intel.clm-default
SMS_D_Ld1_PS.f19_f19_mg17.I2010Clm50Sp.derecho_intel.clm-clm50cam6LndTuningMode
SMS_D_Ln9_P128x3.f19_g17.IHistClm50Sp.derecho_intel.clm-waccmx_offline
SMS_Ld10_D_Mmpi-serial.CLM_USRDAT.I1PtClm60SpRs.derecho_intel.clm-default--clm-NEON-TOOL
SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long
SMS_Ln9.f10_f10_mg37.I2000Clm50Sp.derecho_gnu.clm-clm50cam5LndTuningModeZDustSoilErod
SMS_Ln9.ne30pg2_ne30pg2_mg17.I1850Clm50Sp.derecho_intel.clm-clm50cam6LndTuningMode
SMS_Ln9.ne3pg3_ne3pg3_mg37.I2000Clm50Sp.derecho_gnu.clm-clm50cam6LndTuningMode
SMS_P384x2_D_Ld5.f19_g17.I2000Clm50Sp.derecho_intel.clm-default

Details of bug

It turns out we normally run with -fire_emis on for almost all of our tests (except FATES tests). Note that when you run with Sp compsets the coupler fire variables are just output as missing so there really isn't a good reason to run Sp compsets with fire_emis on.

Important output or errors that show the problem

dec2247.hsn.de.hpc.ucar.edu 764: forrtl: severe (408): fort: (2): Subscript #1 of the array FACTORS has value 17 which is greater than the upper bound of 16
dec2247.hsn.de.hpc.ucar.edu 764:
dec2247.hsn.de.hpc.ucar.edu 764: Image              PC                Routine            Line        Source
dec2247.hsn.de.hpc.ucar.edu 764: cesm.exe           000000000255AACF  fireemisfactorsmo          76  FireEmisFactorsMod.F90
dec2247.hsn.de.hpc.ucar.edu 764: cesm.exe           000000000125BDFE  cnfireemissionsmo          68  CNFireEmissionsMod.F90
dec2247.hsn.de.hpc.ucar.edu 764: cesm.exe           0000000000AB6ED2  clm_instmod_mp_cl         400  clm_instMod.F90
dec2247.hsn.de.hpc.ucar.edu 764: cesm.exe           0000000000AA7F21  clm_initializemod         409  clm_initializeMod.F90
dec2247.hsn.de.hpc.ucar.edu 764: cesm.exe           00000000009AF478  lnd_comp_nuopc_mp         659  lnd_comp_nuopc.F90
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF71E8279  callVFuncPtr             2167  ESMCI_FTable.C
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF71E72B8  ESMCI_FTableCallE         824  ESMCI_FTable.C
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF7684AB2  enter                    2501  ESMCI_VMKernel.C
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF766D346  enter                    1216  ESMCI_VM.C
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF71E865F  c_esmc_ftablecall         981  ESMCI_FTable.C
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF7C6C4FC  esmf_compmod_mp_e        1252  ESMF_Comp.F90
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF853B87E  esmf_gridcompmod_        1419  ESMF_GridComp.F90
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF8FC4D11  nuopc_driver_mp_l        2889  NUOPC_Driver.F90
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF8FAE640  nuopc_driver_mp_i        1982  NUOPC_Driver.F90
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF71E8279  callVFuncPtr             2167  ESMCI_FTable.C
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF71E72B8  ESMCI_FTableCallE         824  ESMCI_FTable.C
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF7684AB2  enter                    2501  ESMCI_VMKernel.C
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF766D346  enter                    1216  ESMCI_VM.C
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF71E865F  c_esmc_ftablecall         981  ESMCI_FTable.C
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF7C6C4FC  esmf_compmod_mp_e        1252  ESMF_Comp.F90
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF853B87E  esmf_gridcompmod_        1419  ESMF_GridComp.F90
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF8FC4D11  nuopc_driver_mp_l        2889  NUOPC_Driver.F90
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF8FAE885  nuopc_driver_mp_i        1987  NUOPC_Driver.F90
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF8F762EE  nuopc_driver_mp_i         487  NUOPC_Driver.F90
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF71E8279  callVFuncPtr             2167  ESMCI_FTable.C
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF71E72B8  ESMCI_FTableCallE         824  ESMCI_FTable.C
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF7684AB2  enter                    2501  ESMCI_VMKernel.C
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF766D346  enter                    1216  ESMCI_VM.C
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF71E865F  c_esmc_ftablecall         981  ESMCI_FTable.C
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF7C6C4FC  esmf_compmod_mp_e        1252  ESMF_Comp.F90
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF853B87E  esmf_gridcompmod_        1419  ESMF_GridComp.F90
dec2247.hsn.de.hpc.ucar.edu 764: cesm.exe           0000000000448EC8  MAIN__                    128  esmApp.F90
dec2247.hsn.de.hpc.ucar.edu 764: cesm.exe           000000000042167D  Unknown               Unknown  Unknown
dec2247.hsn.de.hpc.ucar.edu 764: libc-2.31.so       0000146DE9E5B29D  __libc_start_main     Unknown  Unknown
dec2247.hsn.de.hpc.ucar.edu 764: cesm.exe           00000000004215AA  Unknown               Unknown  Unknown
@ekluzek ekluzek added the bug something is working incorrectly label Sep 14, 2024
@ekluzek ekluzek added this to the cesm3_0_beta03 milestone Sep 14, 2024
@ekluzek ekluzek self-assigned this Sep 14, 2024
@ekluzek
Copy link
Collaborator Author

ekluzek commented Sep 14, 2024

Note, I'm also seeing this for the Bgc NoAnthro test:

SMS_D_Ld3_PS.f09_g17.I1850Clm60BgcNoAnthro.derecho_intel.clm-decStart1851_noinitial--clm-matrixcnOn

so this problem is somehow linked to the NoAnthro setup and not just whether the test is Sp or Bgc vs BgcCrop.

@samsrabin samsrabin changed the title Failing test with new fire-emissions factor file for all PFT's -- only do -fire_emis with Bgc compsets (Sp fire fields are all missing)... -fire_emis NoAnthro tests fail Sep 16, 2024
@ekluzek
Copy link
Collaborator Author

ekluzek commented Sep 16, 2024

In the standup @samsrabin suggested in the discussion for a couple ideas to try:

  • Verify it fails on Izumi in the same way for both tests
  • Verify that NoAnthro works without --fire_emis
  • Try Louisa's fire_emiss file currently being used in CAM
  • I'll also loop in Fang, Louisa and Simone

@samsrabin
Copy link
Collaborator

This patch should solve it, although I haven't even tested whether it builds:

diff --git a/src/biogeochem/FireEmisFactorsMod.F90 b/src/biogeochem/FireEmisFactorsMod.F90
index e97082c0b..7f7f470f3 100644
--- a/src/biogeochem/FireEmisFactorsMod.F90
+++ b/src/biogeochem/FireEmisFactorsMod.F90
@@ -11,6 +11,7 @@ module FireEmisFactorsMod
   use shr_kind_mod, only : r8 => shr_kind_r8
   use abortutils,   only : endrun
   use clm_varctl,   only : iulog
+  use clm_varpar,   only : maxveg
 !
   implicit none
   private
@@ -20,8 +21,6 @@ module FireEmisFactorsMod
   public :: fire_emis_factors_init
   public :: fire_emis_factors_get
 
-! !PRIVATE MEMBERS:
-  integer :: npfts ! number of plant function types
 !
   type emis_eff_t
      real(r8), pointer :: eff(:) ! emissions efficiency factor
@@ -73,10 +72,7 @@ contains
        call endrun(errmes)
     endif
 
-    factors(:npfts) = comp_factors_table( ndx )%eff(:npfts)
-    if ( size(factors) > npfts )then
-       factors(npfts+1:) = comp_factors_table( ndx )%eff(nc3crop)
-    end if
+    factors(:maxveg) = comp_factors_table( ndx )%eff(:maxveg)
     molecwght  = comp_factors_table( ndx )%wght
 
   end subroutine fire_emis_factors_get
@@ -126,9 +122,8 @@ contains
     call ncd_inqdlen( ncid, dimid, n_comps, name='Comp_Num')
     call ncd_inqdlen( ncid, dimid, n_pfts, name='PFT_Num')
 
-    npfts = n_pfts
-    if ( npfts /= mxpft .and. npfts /= 16 )then
-       call endrun('Number of PFTs on fire emissions file is NOT correct. Its neither the total number of PFTS nor 16')
+    if ( n_pfts < maxveg )then
+       call endrun('Number of PFTs on fire emissions file is less than the number of PFTs in the run')
     end if
 
     ierr = pio_inq_varid(ncid,'Comp_EF',  comp_ef_vid)
@@ -146,7 +141,7 @@ contains
     call  bld_hash_table_indices( comp_names )
     do i=1,n_comps
        start=(/i,1/)
-       count=(/1,npfts/)
+       count=(/1,n_pfts/)
        ierr = pio_get_var( ncid, comp_ef_vid,  start, count, comp_factors )
 
        call enter_hash_data( trim(comp_names(i)), comp_factors, comp_molecwghts(i)  )

@ekluzek ekluzek changed the title -fire_emis NoAnthro tests fail -fire_emis NoAnthro tests fail because surface dataset is 16pft rather than 78 to match the new fire-emis file Sep 16, 2024
@ekluzek
Copy link
Collaborator Author

ekluzek commented Sep 17, 2024

I replicated the issue with a standard 16-pft dataset (so not a NoAnthro one) on Izumi with the nag compiler as follows:

i017.cgd.ucar.edu:mpi_rank_29][error_sighandler] Caught error: Aborted (signal 6)
Runtime Error: /fs/cgd/data0/erik/ctsm_worktree/quickfix/src/biogeochem/FireEmisFactorsMod.F90, line 76: Subscript 1 of by ESMAPP
[i017.cgd.ucar.edu:mpi_rank_10][error_sighandler] Caught error: Aborted (signal 6)
FACTORS (value 78) is out of range (1:16)400: Called by CLM_INSTMOD:CLM_INSTINIT
/fs/cgd/data0/erik/ctsm_worktree/quickfix/src/main/clm_initializeMod.F90, li
Program terminated by fatal error
/fs/cgd/data0/erik/ctsm_worktree/quickfix/src/biogeochem/FireEmisFactorsMod.F90, line 76: Error occurred in FIREEMISFACTORSMOD:FIRE_EMIS_FACTORS_GET
/fs/cgd/data0/erik/ctsm_worktree/quickfix/src/biogeochem/CNFireEmissionsMod.F90, ne 409: Called by CLM_INITIALIZEMOD:INITIALIZE2
/fs/cgd/data0/erik/ctsm_worktree/quickfix/src/cpl/nuopc/lnd_comp_nuopc.F90, line 659: Called bline 68: Called by CNFIREEMISSIONSMOD:INIT
/fs/cgd/data0/erik/ctsm_worktree/quickfix/src/main/clm_instMod.F90, line 400: Cally LND_COMP_NUOPC:INITIALIZEREALIZE
/fs/cgd/data0/erik/ctsm_worktree/quickfix/components/cmeps/cime_config/../cesm/driver/esmApp.F90, line 128: Called by ESMAPP
ed by CLM_INSTMOD:CLM_INSTINIT
/fs/cgd/data0/erik/ctsm_worktree/quickfix/src/main/clm_initializeMod.F90, line 409: Called by CLM_INITIALIZEMOD:INITIALIZE2
/fs/cgd/data0/erik/ctsm_worktree/quickfix/src/cpl/nuopc/lnd_comp_nuopc.F90, line 659: Called by LND_COMP_NUOPC:INITIALIZEREALIZE
/fs/cgd/data0/erik/ctsm_worktree/quickfix/components/cmeps/cime_config/../cesm/driver/esmApp.F90, line 128: Called by ESMAPP
[i017.cgd.ucar.edu:mpi_rank_21][error_sighandler] Caught error: Aborted (signal 6)
[i017.cgd.ucar.edu:mpi_rank_0][error_sighandler] Caught error: Aborted (signal 6)

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 243471 RUNNING AT i017.cgd.ucar.edu
=   EXIT CODE: 134
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

ekluzek added a commit to slevis-lmwg/ctsm that referenced this issue Sep 17, 2024
@ekluzek
Copy link
Collaborator Author

ekluzek commented Sep 25, 2024

This was resolved in ctsm5.3.0

@ekluzek ekluzek closed this as completed Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something is working incorrectly
Projects
None yet
Development

No branches or pull requests

2 participants