Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] Remove local copies of FindESMF from all components, update ESMF to 8.6.1 and MAPL to 2.46.3 #2406

Draft
wants to merge 18 commits into
base: develop
Choose a base branch
from

Conversation

DusanJovic-NOAA
Copy link
Collaborator

@DusanJovic-NOAA DusanJovic-NOAA commented Aug 23, 2024

Commit Queue Requirements:

  • Fill out all sections of this template.
  • All sub component pull requests have been reviewed by their code managers.
  • Run the full Intel+GNU RT suite (compared to current baselines) on either Hera/Derecho/Hercules
  • Commit 'test_changes.list' from previous step

Description:

This PR requires two libraries (ESMF v8.6.1 and MAPL v 2.64.3) to be installed on all supported machines. So far, I'm testing these updates on Hercules using ue-esmf-8.6.1-mapl-2.46.2 spack-stack.

Commit Message:

* UFSWM - 
  * AQM - 
  * CDEPS - 
  * CICE - 
  * CMEPS - 
  * CMakeModules - 
  * FV3 - 
    * ccpp-physics - 
    * atmos_cubed_sphere - 
  * GOCART - 
  * HYCOM - 
  * MOM6 - 
  * NOAHMP - 
  * WW3 - 
  * stochastic_physics - 

Priority:

  • Normal

Git Tracking

UFSWM:

Sub component Pull Requests:

  • AQM:
  • CDEPS:
  • CICE:
  • CMEPS:
  • CMakeModules:
  • FV3:
    • ccpp-physics:
    • atmos_cubed_sphere:
  • GOCART:
  • HYCOM:
  • MOM6:
  • NOAHMP:
  • WW3:
  • stochastic_physics:
  • None

UFSWM Blocking Dependencies:

  • Blocked by #
  • None

Changes

Regression Test Changes (Please commit test_changes.list):

  • PR Adds New Tests/Baselines.
  • PR Updates/Changes Baselines.
  • No Baseline Changes.

Input data Changes:

  • None.
  • New input data.
  • Updated input data.

Library Changes/Upgrades:

  • Required
    • Library names w/versions:
    • Git Stack Issue (JCSDA/spack-stack#)
  • No Updates

Testing Log:

  • RDHPCS
    • Hera
    • Orion
    • Hercules
    • Jet
    • Gaea
    • Derecho
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
  • opnReqTest (complete task if unnecessary)

@DusanJovic-NOAA DusanJovic-NOAA marked this pull request as draft August 23, 2024 14:09
@bbakernoaa
Copy link
Collaborator

@DusanJovic-NOAA If you are testing this with the head of the develop branch from GOCART (hash bf5cf04) then you need to update the tests/parm/gocart/SU2G_instance_SU.rc file

Specifically this line

It will need to be changed with

volcano_srcfilen_explosive: /dev/null
volcano_srcfilen_degassing: /PATH_TO_NEW_FILE/so2_volcanic_emissions_CARN_v202401.degassing_only.rc

I have a copy of this new file on hera: /scratch1/RDARCH/rda-arl-gpu/Barry.Baker/emissions/GEFS/nexus/VOLCANIC/so2_volcanic_emissions_CARN_v202401.degassing_only.rc

Once you have this I think your problems with updating gocart will be solved

@DusanJovic-NOAA
Copy link
Collaborator Author

I made the suggested change in SU2G_instance_SU.rc and now I get this error:

$ grep 00000 err
pe=00000 FAIL at line=01088    MAPL_CapGridComp.F90                     <status=41>
pe=00000 FAIL at line=01088    MAPL_CapGridComp.F90                     <status=41>
pe=00000 FAIL at line=01560    MAPL_EsmfRegridder.F90                   <destination masking with this regrid type is unsupported>
pe=00000 FAIL at line=01382    MAPL_EsmfRegridder.F90                   <status=1>
pe=00000 FAIL at line=00977    MAPL_AbstractRegridder.F90               <status=1>
pe=00000 FAIL at line=00097    NewRegridderManager.F90                  <status=1>
pe=00000 FAIL at line=01101    GriddedIO.F90                            <status=1>
pe=00000 FAIL at line=04539    ExtDataGridCompMod.F90                   <status=1>
pe=00000 FAIL at line=01468    ExtDataGridCompMod.F90                   <status=1>
pe=00000 FAIL at line=01838    MAPL_Generic.F90                         <status=1>
pe=00000 FAIL at line=01241    MAPL_CapGridComp.F90                     <status=1>
pe=00000 FAIL at line=01204    MAPL_CapGridComp.F90                     <status=1>
pe=00000 FAIL at line=01164    MAPL_CapGridComp.F90                     <status=1>
pe=00000 FAIL at line=00832    MAPL_CapGridComp.F90                     <status=1>
pe=00000 FAIL at line=00972    MAPL_CapGridComp.F90                     <status=1>

@bbakernoaa
Copy link
Collaborator

@DusanJovic-NOAA do you have the new MAPL/ESMF installed on hera?

@DusanJovic-NOAA
Copy link
Collaborator Author

@DusanJovic-NOAA do you have the new MAPL/ESMF installed on hera?

I don't. I already asked EPIC twice to install the updated version of MPAS.

@DusanJovic-NOAA
Copy link
Collaborator Author

@bbakernoaa I repeated the cpld_control_p8_intel test on Hercules with updated gocart (top of develop 327ff344) and MAPL 2.46.3 and it fails with the same error:

djovic@hercules-login-4: /work2/noaa/stmp/djovic/stmp/djovic/FV3_RT/rt_4012304/cpld_control_p8_intel
$ grep 00000 err
  0: pe=00000 FAIL at line=01088    MAPL_CapGridComp.F90                     <status=41>
  0: pe=00000 FAIL at line=01088    MAPL_CapGridComp.F90                     <status=41>
  0: pe=00000 FAIL at line=01560    MAPL_EsmfRegridder.F90                   <destination masking with this regrid type is unsupported>
  0: pe=00000 FAIL at line=01382    MAPL_EsmfRegridder.F90                   <status=1>
  0: pe=00000 FAIL at line=00977    MAPL_AbstractRegridder.F90               <status=1>
  0: pe=00000 FAIL at line=00097    NewRegridderManager.F90                  <status=1>
  0: pe=00000 FAIL at line=01101    GriddedIO.F90                            <status=1>
  0: pe=00000 FAIL at line=04539    ExtDataGridCompMod.F90                   <status=1>
  0: pe=00000 FAIL at line=01468    ExtDataGridCompMod.F90                   <status=1>
  0: pe=00000 FAIL at line=01838    MAPL_Generic.F90                         <status=1>
  0: pe=00000 FAIL at line=01241    MAPL_CapGridComp.F90                     <status=1>
  0: pe=00000 FAIL at line=01204    MAPL_CapGridComp.F90                     <status=1>
  0: pe=00000 FAIL at line=01164    MAPL_CapGridComp.F90                     <status=1>
  0: pe=00000 FAIL at line=00832    MAPL_CapGridComp.F90                     <status=1>
  0: pe=00000 FAIL at line=00972    MAPL_CapGridComp.F90                     <status=1>

I also updated SU2G_instance_SU.rc to include these two lines:

# Volcanic pointwise sources
volcano_srcfilen_explosive: /dev/null
volcano_srcfilen_degassing: ./so2_volcanic_emissions_CARN_v202401.degassing_only.rc                                                          

My run directory on Hercules is /work2/noaa/stmp/djovic/stmp/djovic/FV3_RT/rt_4012304/cpld_control_p8_intel

@junwang-noaa
Copy link
Collaborator

@weiyuan-jiang Can you access Hercules? May I ask if you can take a look as well?

@weiyuan-jiang
Copy link
Collaborator

@weiyuan-jiang Can you access Hercules? May I ask if you can take a look as well?

Yes, I will take a look

@bena-nasa
Copy link

bena-nasa commented Sep 25, 2024

I made the suggested change in SU2G_instance_SU.rc and now I get this error:

$ grep 00000 err
pe=00000 FAIL at line=01088    MAPL_CapGridComp.F90                     <status=41>
pe=00000 FAIL at line=01088    MAPL_CapGridComp.F90                     <status=41>
pe=00000 FAIL at line=01560    MAPL_EsmfRegridder.F90                   <destination masking with this regrid type is unsupported>
pe=00000 FAIL at line=01382    MAPL_EsmfRegridder.F90                   <status=1>
pe=00000 FAIL at line=00977    MAPL_AbstractRegridder.F90               <status=1>
pe=00000 FAIL at line=00097    NewRegridderManager.F90                  <status=1>
pe=00000 FAIL at line=01101    GriddedIO.F90                            <status=1>
pe=00000 FAIL at line=04539    ExtDataGridCompMod.F90                   <status=1>
pe=00000 FAIL at line=01468    ExtDataGridCompMod.F90                   <status=1>
pe=00000 FAIL at line=01838    MAPL_Generic.F90                         <status=1>
pe=00000 FAIL at line=01241    MAPL_CapGridComp.F90                     <status=1>
pe=00000 FAIL at line=01204    MAPL_CapGridComp.F90                     <status=1>
pe=00000 FAIL at line=01164    MAPL_CapGridComp.F90                     <status=1>
pe=00000 FAIL at line=00832    MAPL_CapGridComp.F90                     <status=1>
pe=00000 FAIL at line=00972    MAPL_CapGridComp.F90                     <status=1>

This crash is not related to the volcanic emissions files. So changing the volcanic emissions files cannot cause this. Either something else was changed or changing the volcanic emissions files just meant the model got further and you hit a different problem.
Could be because of the v2.46.3 update? But how you got to that block is mysterious, that is not the default Regridding option in anything so unless someone changed somehow how you got there is beyond me.

That said, the grid that is getting passed down to gocart and the cap to gocart, does that have a mask set on it somewhere else in UFS?

@junwang-noaa
Copy link
Collaborator

@DusanJovic-NOAA may I ask where your model source code is for the run /work2/noaa/stmp/djovic/stmp/djovic/FV3_RT/rt_4012304/cpld_control_p8_intel?

@DusanJovic-NOAA
Copy link
Collaborator Author

@DusanJovic-NOAA may I ask where your model source code is for the run /work2/noaa/stmp/djovic/stmp/djovic/FV3_RT/rt_4012304/cpld_control_p8_intel?

/work/noaa/fv3-cam/djovic/ufs/gocart_mapl/ufs-weather-model

@junwang-noaa
Copy link
Collaborator

From @bena-nasa: I have made branch:

https://github.com/GEOS-ESM/MAPL/tree/hotfix/bmauer/candidate_v2.46.4

with what should fix the issue you were seeing with MAPLv2.46.3, please try and let me know. If it fixes your issues we can make a v2.46.4 release of mapl and hotfix this onto our develop.

@RatkoVasic-NOAA would you please install this MAPL version on Hercules for us to test if it resolves the model failure? Thanks

@GeorgeVandenberghe-NOAA
Copy link
Collaborator

From @bena-nasa: I have made branch:

https://github.com/GEOS-ESM/MAPL/tree/hotfix/bmauer/candidate_v2.46.4

with what should fix the issue you were seeing with MAPLv2.46.3, please try and let me know. If it fixes your issues we can make a v2.46.4 release of mapl and hotfix this onto our develop.

@RatkoVasic-NOAA would you please install this MAPL version on Hercules for us to test if it resolves the model failure? Thank

What is the git syntax to get this new hotfix and build the new(er) MAPL? I am going to try to do this offline rather than waiting for possible down the road installation.

@junwang-noaa
Copy link
Collaborator

@RatkoVasic-NOAA @jkbk2004 @ulmononian May I ask if the EPIC library team can install the MAPL candidate_v2.46.4 in spack-stack 1.6.0?

We can't move on with the testing without the new library. Thanks

@GeorgeVandenberghe-NOAA
Copy link
Collaborator

GeorgeVandenberghe-NOAA commented Oct 1, 2024 via email

@RatkoVasic-NOAA
Copy link
Collaborator

@junwang-noaa @GeorgeVandenberghe-NOAA I can try to install that version. Which machine do you want to test?

@junwang-noaa
Copy link
Collaborator

@RatkoVasic-NOAA can you install it on Hercules? Thanks

@RatkoVasic-NOAA
Copy link
Collaborator

@RatkoVasic-NOAA can you install it on Hercules? Thanks

@junwang-noaa will do.

@GeorgeVandenberghe-NOAA
Copy link
Collaborator

GeorgeVandenberghe-NOAA commented Oct 2, 2024 via email

@RatkoVasic-NOAA
Copy link
Collaborator

@GeorgeVandenberghe-NOAA I have never installed MAPL outside of spack-stack.

@bena-nasa can you please make a tag from your branch, I cannot install it in spack-stack. Here are available tags:

    version("2.48.0", sha256="60a0fc4fd82b1a05050666ae478da7d79d86305aff1643a57bc09cb5347323b7")
    version("2.47.2", sha256="d4ca384bf249b755454cd486a26bae76944a7cae3a706b9a7c9298825077cac0")
    version("2.47.1", sha256="ca3e94c0caa78a91591fe63603d1836196f5294d4baad7cf1d83b229b3a85916")
    version("2.47.0", sha256="66c862d2ab8bcd6969e9728091dbca54f1f420e97e41424c4ba93ef606088459")
    version("2.46.3", sha256="333e1382ab744302d28b6f39e7f5504c7919d77d2443d70af952f60cbd8f27e7")
    version("2.46.2", sha256="6d397ad73042355967de8ef5b521d6135c004f96e93ae7b215f9ee325e75c6f0")
    version("2.46.1", sha256="f3090281de6293b484259d58f852c45b98759de8291d36a4950e6d348ece6573")
    version("2.46.0", sha256="726d9588b724bd43e5085d1a2f8d806d548f185ed6b22a1b13c0ed06212d7be2")

@GeorgeVandenberghe-NOAA
Copy link
Collaborator

GeorgeVandenberghe-NOAA commented Oct 2, 2024 via email

@GeorgeVandenberghe-NOAA
Copy link
Collaborator

GeorgeVandenberghe-NOAA commented Oct 2, 2024 via email

@BrianCurtis-NOAA
Copy link
Collaborator

BrianCurtis-NOAA commented Oct 2, 2024

I need the git syntax to properly get this hotfix https://github.com/GEOS-ESM/MAPL/tree/hotfix/bmauer/candidate_v2.46.4

On Wed, Oct 2, 2024 at 3:49 PM RatkoVasic-NOAA @.> wrote: @GeorgeVandenberghe-NOAA https://github.com/GeorgeVandenberghe-NOAA I have never installed MAPL outside of spack-stack. @bena-nasa https://github.com/bena-nasa can you please make a tag from your branch, I cannot install it in spack-stack. Here are available tags: version("2.48.0", sha256="60a0fc4fd82b1a05050666ae478da7d79d86305aff1643a57bc09cb5347323b7") version("2.47.2", sha256="d4ca384bf249b755454cd486a26bae76944a7cae3a706b9a7c9298825077cac0") version("2.47.1", sha256="ca3e94c0caa78a91591fe63603d1836196f5294d4baad7cf1d83b229b3a85916") version("2.47.0", sha256="66c862d2ab8bcd6969e9728091dbca54f1f420e97e41424c4ba93ef606088459") version("2.46.3", sha256="333e1382ab744302d28b6f39e7f5504c7919d77d2443d70af952f60cbd8f27e7") version("2.46.2", sha256="6d397ad73042355967de8ef5b521d6135c004f96e93ae7b215f9ee325e75c6f0") version("2.46.1", sha256="f3090281de6293b484259d58f852c45b98759de8291d36a4950e6d348ece6573") version("2.46.0", sha256="726d9588b724bd43e5085d1a2f8d806d548f185ed6b22a1b13c0ed06212d7be2") — Reply to this email directly, view it on GitHub <#2406 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDS4FTDC6IPFORIDHYB7V3ZZQIXDAVCNFSM6AAAAABNAIK762VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOBZGAZDAMJTGY . You are receiving this because you were mentioned.Message ID: @.>
-- George W Vandenberghe Lynker Technologies at * NOAA/NWS/NCEP/EMC 5830 University Research Ct., Rm. 2141 College Park, MD 20740 @.** 301-683-3769(work) 3017751547(cell)

I believe:

git remote add GEOS-ESM https://github.com/GEOS-ESM/MAPL
git remote update
git merge GEOS-ESM hotfix/bmauer/candidate_v2.46.4

(might be git merge GEOS-ESM/hotfix/bmauer/candidate_v2.46.4

@BrianCurtis-NOAA
Copy link
Collaborator

I typo'd, it's GEOS-ESM

@GeorgeVandenberghe-NOAA
Copy link
Collaborator

GeorgeVandenberghe-NOAA commented Oct 2, 2024 via email

@mathomp4
Copy link

@GeorgeVandenberghe-NOAA As far as I can tell from our Findudunits.cmake file, it shouldn't care if it's shared or not.

Indeed, in Baselibs, we build udunits2 as static, but we can also build MAPL with spack and as far as I can see from the spack package, that is built as shared by default.

So, I can only conclude, MAPL does not care what type of udunits is built.

@GeorgeVandenberghe-NOAA
Copy link
Collaborator

@DusanJovic-NOAA
Copy link
Collaborator Author

$ ls -l /work/noaa/noaatest/gwv/herc/simstacks/simstack.1008/netcdf140.492.460.mapl241.fms2301.crtm240/MAPL-2.50.2
ls: cannot access '/work/noaa/noaatest/gwv/herc/simstacks/simstack.1008/netcdf140.492.460.mapl241.fms2301.crtm240/MAPL-2.50.2': Permission denied

@DusanJovic-NOAA
Copy link
Collaborator Author

Is MAPL v2.50.2 expected to fix the GOCART errors we had with v2.46.x?

@mathomp4
Copy link

Is MAPL v2.50.2 expected to fix the GOCART errors we had with v2.46.x?

@DusanJovic-NOAA I would...hope so? All I can say is that MAPL v2.50.2 is a superset of the v2.40.3.1. Every change we backported to v2.40.3.1 is in v2.50.2. Obviously there is a lot more in v2.50, but it should have everything that was added in the tweak version.

@GeorgeVandenberghe-NOAA
Copy link
Collaborator

@DusanJovic-NOAA
Copy link
Collaborator Author

I get this error with 2.50.2 that George compiled

100: forrtl: severe (122): invalid attempt to assign into a pointer that is not associated
100: Image              PC                Routine            Line        Source
100: fv3.exe            00000000062146AF  Unknown               Unknown  Unknown
100: fv3.exe            00000000021E641E  mapl_historygridc        2414  MAPL_HistoryGridComp.F90
100: fv3.exe            0000000000A95CC4  Unknown               Unknown  Unknown
100: fv3.exe            0000000000A99C3F  Unknown               Unknown  Unknown
100: fv3.exe            00000000009E8FBA  Unknown               Unknown  Unknown
100: fv3.exe            00000000008E38F8  Unknown               Unknown  Unknown
100: fv3.exe            0000000000A9711A  Unknown               Unknown  Unknown
100: fv3.exe            0000000000975CD0  Unknown               Unknown  Unknown

code changes for this test:

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 702cf6a4..eaa9166b 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -152,6 +152,7 @@ endif()

 find_package(NetCDF 4.7.4 REQUIRED C Fortran)
 find_package(ESMF 8.3.0 MODULE REQUIRED)
+add_library(ESMF::ESMF ALIAS esmf)
 if(FMS)
   find_package(FMS 2022.04 REQUIRED COMPONENTS R4 R8)
   if(APP MATCHES "^(HAFSW)$")
diff --git a/modulefiles/ufs_common.lua b/modulefiles/ufs_common.lua
index 062fa384..f35d0d2e 100644
--- a/modulefiles/ufs_common.lua
+++ b/modulefiles/ufs_common.lua
@@ -10,7 +10,7 @@ local ufs_modules = {
   {["netcdf-c"]        = "4.9.2"},
   {["netcdf-fortran"]  = "4.6.1"},
   {["parallelio"]      = "2.5.10"},
-  {["esmf"]            = "8.6.0"},
+  {["esmf"]            = "8.6.1"},
   {["fms"]             = "2024.01"},
   {["bacio"]           = "2.4.1"},
   {["crtm"]            = "2.4.0"},
@@ -20,7 +20,7 @@ local ufs_modules = {
   {["sp"]              = "2.5.0"},
   {["w3emc"]           = "2.10.0"},
   {["gftl-shared"]     = "1.6.1"},
-  {["mapl"]            = "2.40.3-esmf-8.6.0"},
+  -- {["mapl"]            = "2.40.3.1-esmf-8.6.1"},
   {["scotch"]          = "7.0.4"},
 }

diff --git a/modulefiles/ufs_hercules.intel.lua b/modulefiles/ufs_hercules.intel.lua
index 455ea4d0..3abb247c 100644
--- a/modulefiles/ufs_hercules.intel.lua
+++ b/modulefiles/ufs_hercules.intel.lua
@@ -2,7 +2,7 @@ help([[
 loads UFS Model prerequisites for Hercules/Intel
 ]])

-prepend_path("MODULEPATH", "/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/envs/fms-2024.01/install/modulefiles/Core")
+prepend_path("MODULEPATH", "/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/envs/mapl-2.40.3.1-intel-2021.9.0/install/modulefiles/Core")

 stack_intel_ver=os.getenv("stack_intel_ver") or "2021.9.0"
 load(pathJoin("stack-intel", stack_intel_ver))
@@ -13,6 +13,8 @@ load(pathJoin("stack-intel-oneapi-mpi", stack_impi_ver))
 cmake_ver=os.getenv("cmake_ver") or "3.23.1"
 load(pathJoin("cmake", cmake_ver))

+prepend_path("CMAKE_PREFIX_PATH", "/work/noaa/noaatest/gwv/herc/simstacks/simstack.1008/netcdf140.492.460.mapl241.fms2301.crtm240/MAPL-2.50.2")
+
 load("ufs_common")

 nccmp_ver=os.getenv("nccmp_ver") or "1.9.0.1"

@mathomp4
Copy link

@DusanJovic-NOAA Can you point us to the HISTORY.rc being used? I found:

https://github.com/ufs-community/ufs-weather-model/blob/develop/tests/parm/gocart/AERO_HISTORY.rc.IN

but that looks boring and should definitely be fine.

@DusanJovic-NOAA
Copy link
Collaborator Author

@DusanJovic-NOAA Can you point us to the HISTORY.rc being used? I found:

https://github.com/ufs-community/ufs-weather-model/blob/develop/tests/parm/gocart/AERO_HISTORY.rc.IN

but that looks boring and should definitely be fine.

$ ls -l /work2/noaa/stmp/djovic/stmp/djovic/FV3_RT/rt_1214371/cpld_control_p8_intel/AERO_HISTORY.rc
-rw-r--r-- 1 djovic stmp 17686 Nov 14 10:12 /work2/noaa/stmp/djovic/stmp/djovic/FV3_RT/rt_1214371/cpld_control_p8_intel/AERO_HISTORY.rc

@DusanJovic-NOAA
Copy link
Collaborator Author

I rebuilt mapl v2.50.2 and reran cpld_control_p8 test and I now see errors similar to what we had before:

 82: pe=00082 FAIL at line=02772    Base_Base_implementation.F90             <It only works for cubed-sphere grid>
 82: pe=00082 FAIL at line=02634    Base_Base_implementation.F90             <status=1>
 82: pe=00082 FAIL at line=00678    SU2G_GridCompMod.F90                     <status=1>
 86: pe=00086 FAIL at line=02772    Base_Base_implementation.F90             <It only works for cubed-sphere grid>
 86: pe=00086 FAIL at line=02634    Base_Base_implementation.F90             <status=1>
 86: pe=00086 FAIL at line=00678    SU2G_GridCompMod.F90                     <status=1>
106: pe=00106 FAIL at line=02772    Base_Base_implementation.F90             <It only works for cubed-sphere grid>
106: pe=00106 FAIL at line=02634    Base_Base_implementation.F90             <status=1>
106: pe=00106 FAIL at line=00678    SU2G_GridCompMod.F90                     <status=1>

In stdout I also see:

  4:  enter get_nggps_ic is=  33 ie=  64 js=  13 je=  24 isd=  30 ied=  67 jsd=  10 jed=  27
  4:  SU::Run1 - cannot get indices for point emissions
  4: SU::Run1                                       849
  4:  SU::SU::Run1 - cannot get indices for point emissions
  4: SU::SU::Run1                                   849
  4:  SU::SU::SU::Run1 - cannot get indices for point emissions
  4: SU::SU::SU::Run1                               849
  4:  SU::SU::SU::SU::Run1 - cannot get indices for point emissions
  4: SU::SU::SU::SU::Run1                           849
  4:  SU::SU::SU::SU::SU::Run1 - cannot get indices for point emissions
  4: SU::SU::SU::SU::SU::Run1                       849
  4:  SU::SU::SU::SU::SU::SU::Run1 - cannot get indices for point emissions
  4: SU::SU::SU::SU::SU::SU::Run1                   849
  4:  SU::SU::SU::SU::SU::SU::SU::Run1 - cannot get indices for point emissions
  4: SU::SU::SU::SU::SU::SU::SU::Run1               849
  4:  SU::SU::SU::SU::SU::SU::SU::SU::Run1 - cannot get indices for point emissions
  4: SU::SU::SU::SU::SU::SU::SU::SU::Run1           849
  4:  SU::SU::SU::SU::SU::SU::SU::SU::SU::Run1
  4:   - cannot get indices for point emissions
  4: SU::SU::SU::SU::SU::SU::SU::SU::SU::Run1       849
  4:  SU::SU::SU::SU::SU::SU::SU::SU::SU::SU::Run1
  4:   - cannot get indices for point emissions
  4: SU::SU::SU::SU::SU::SU::SU::SU::SU::SU::       849
  4:  SU::SU::SU::SU::SU::SU::SU::SU::SU::SU::SU::Run1
  4:   - cannot get indices for point emissions
  4: SU::SU::SU::SU::SU::SU::SU::SU::SU::SU::       849
  4:  SU::SU::SU::SU::SU::SU::SU::SU::SU::SU::SU::SU::Run1
  4:   - cannot get indices for point emissions
  4: SU::SU::SU::SU::SU::SU::SU::SU::SU::SU::       849
  4:  SU::SU::SU::SU::SU::SU::SU::SU::SU::SU::SU::SU::SU::Run1
  4:   - cannot get indices for point emissions
  4: SU::SU::SU::SU::SU::SU::SU::SU::SU::SU::       849
  4:  SU::SU::SU::SU::SU::SU::SU::SU::SU::SU::SU::SU::SU::SU::Run1
  4:   - cannot get indices for point emissions

I'm not sure if these messages in stdout are new or not, I do not remember if I looked at stdout before, maybe just at stderr. Anyway, my run directory is:

/work2/noaa/stmp/djovic/stmp/djovic/FV3_RT/rt_1287125/cpld_control_p8_intel

@weiyuan-jiang
Copy link
Collaborator

I believe v2.50.2 and v2.46.4 have the same issue here

@mathomp4
Copy link

mathomp4 commented Nov 14, 2024

Ugh. Iam cascade. Let me try and fix that...

ETA: But yes, as @weiyuan-jiang says, I think that is same weird "getting to the wrong bit of MAPL" we were seeing. ☹️

ETA2: We think we know the fix for the "infinite SU" but I'll want to test it. I mean, it's not what is breaking things (just a print) but it is annoying.

@DusanJovic-NOAA
Copy link
Collaborator Author

I added a debug print in Base_Base_implementation.F90 right before code prints FAIL message, like this:

2770     IM_World = dims(1)
2771     JM_World = dims(2)
2772     write(0,*)'IM_World,JM_World: ', IM_World,JM_World                                                                            
2773     _ASSERT( IM_WORLD*6 == JM_WORLD, "It only works for cubed-sphere grid")
2774 

and I see in stderr file:

115:  IM_World,JM_World:           32          12
115: pe=00115 FAIL at line=02773    Base_Base_implementation.F90             <It only works for cubed-sphere grid>
115: pe=00115 FAIL at line=02634    Base_Base_implementation.F90             <status=1>
115: pe=00115 FAIL at line=00678    SU2G_GridCompMod.F90   

Shouldn't IM_World be 96 for C96 cubed sphere grid? And, for whatever reason, JM_WORLD be IM_WORLD*6. The values of 32 and 12 I believe are because the tile layout is 3 x 8:

&fv_core_nml
  layout = 3,8
  io_layout = 1,1
  npx = 97
  npy = 97
  ntiles = 6
  ....

@mathomp4
Copy link

Hmm. I think that sort of looks like the weirdness @weiyuan-jiang has been seeing, but he or @tclune can comment more. (Like our VM or something is corrupted?)

Do you see the same issue with GCC? We were "hoping" that if GCC shows the same issue, it might be something MAPL-ESMF related. But maybe this is an Intel 2021.9 issue? (We've never run with that version.)

@tclune
Copy link

tclune commented Nov 15, 2024

I think it is similar. We basically found that the VM gets confused and starts thinking it is running on a single PET. We compute IM_WORLD by summing over the DE's, and ... when there is only 1 PET, you end up seeing local extents where we expect to see global extents.

Let me know if this does not align with the numbers you are seeing.

@junwang-noaa
Copy link
Collaborator

Since MAPL 2.40.3 is working, could there be something introduced in a later version of MAPL causing the problem?

@weiyuan-jiang
Copy link
Collaborator

I had noticed that when compiling failed, it still got an executable and the test still run. Something must be wrong here. For example, /work2/noaa/stmp/wjiang/stmp/wjiang/FV3_RT/rt_662081

@DusanJovic-NOAA
Copy link
Collaborator Author

I had noticed that when compiling failed, it still got an executable and the test still run. Something must be wrong here. For example, /work2/noaa/stmp/wjiang/stmp/wjiang/FV3_RT/rt_662081

It is probably using the executable from some of the previous runs that successfully compiled the code but failed the overall regression test, so the executables are not deleted. Make sure you delete all previously built executables.

@weiyuan-jiang
Copy link
Collaborator

I believe v2.50.2 and v2.46.4 have the same issue here

Actually, the working version of v2.40.3.1 has the same issue. It just didn't query the vm or the grid so it seemed successful. At this point, we really want to test it with different compiler and mpi. @junwang-noaa

@junwang-noaa
Copy link
Collaborator

@weiyuan-jiang Thanks for proposing testing different compiler and mpi. This PR will allow us to move to the new ESMF 8.6.1 (we are testing newer ESMF version). We can have a future PR to update MAPL to a later version (v2.50.2 and v2.46.4). Let's discuss the compiler and mpi version in issue #2346, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should we update FindESMF.cmake and maybe the name of the imported esmf target