Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix writing min/max variables using PFIO #1673

Open
wants to merge 3 commits into
base: eis-freshwater2
Choose a base branch
from

Conversation

jvgeiger
Copy link
Contributor

Description

Fix writing min/max variables using PFIO.

Please note that support for writing routing variables using PFIO is still broken.

This fixes writing min/max fields using PFIO.
Note that support for writing routing varibles using PFIO is still
broken.
@jvgeiger jvgeiger added the eis-freshwater2 Changes related to EIS Freshwater project label Jan 14, 2025
@jvgeiger jvgeiger requested a review from dmocko January 14, 2025 15:30
@emkemp
Copy link
Contributor

emkemp commented Feb 14, 2025

I compiled this with comp/intel/2023.2.1 and found now errors.

However, strict checks identified a number of unused local variables. I took the liberty of removing them to slightly trim the memory usage.

There is an unused argument (local_var2d) with subroutine PFIO_write_single_routingvar, but I elected not to remove it to avoid breaking code.

@emkemp
Copy link
Contributor

emkemp commented Feb 14, 2025

Note that this PR still requires a review by @dmocko

@dmocko
Copy link
Contributor

dmocko commented Feb 18, 2025

I am trying to compile the code with the "-2" configure option on the Milan nodes using lisf_7.5_intel_2023.2.1_mapl (using "strict-checks"). However, I am receiving this error:

icx: error: unknown argument '-check=conversions,stack,uninit'; did you mean '-fcheck=conversions,stack,uninit'?
icx: error: unknown argument '-ftrapuv'; did you mean '-ftrapv'?
icx: remark: Note that use of '-g' without any optimization-level option will turn off most compiler optimizations similar to use of '-O0'; use '-Rno-debug-disables-optimization' to disable this remark [-Rdebug-disables-optimization]

My make/user.cfg is:

Flake.1.0: Off
VIC.4.1.1: Off
VIC.4.1.2: Off
CLM.2: Off
LSM RDHM.3.5.6: Off
AWRAL: Off
AWRAL.6.0.0: Off

The above error happens after trying to compile: runmodes/wrf_cpl_mode/LISWRFimport_module.F90

If I add this to my make/user.cfg as well:

WRF coupling: Off

..then the error happens when trying to compile: core/LIS_RTM_FTable.c

Any advice on what's happening?

@dmocko
Copy link
Contributor

dmocko commented Feb 18, 2025

Ugh - my bad. I forgot to consult: #1642

Trying again, and will update if I still am unable to compile. Sheesh - first I post on the wrong thread, and now this.

@emkemp
Copy link
Contributor

emkemp commented Feb 18, 2025

Hi @dmocko:

You need to edit the configure.lis file and change the flags per the error message.

The underlying cause is the latest Intel C compiler (icx) is a complete replacement of icc, and some of the compiler flags have changed. Since LISF 7.5 is old, the flags have not been changed.

(Note that in master, those icc flags have been commented out. That's not a perfect solution either, but since most LIS code is Fortran it's reasonable to focus on debugging the Fortran code instead of C. And the user can always add them manually.)

@emkemp
Copy link
Contributor

emkemp commented Feb 18, 2025

ADDENDUM: We probably need to edit arch/Config.pl to check the exact name of the Intel compiler being used, and provide the appropriate flags. Current practice is to simply check if "linux_ifc" is used, which assumes icc and ifort.

@dmocko
Copy link
Contributor

dmocko commented Feb 18, 2025

Update - the code compiles with "strict-check"s configured, but crashed during the model run.

It seems unrelated to this PR, however. Here is the error message:

forrtl: severe (194): Run-Time Check Failure. The variable 'module_sf_noahmplsm_401_mp_compute_pet_$FRZGRA' is being used in '../surfacemodels/land/noahmp.4.0.1/phys/module_sf_noahmplsm_401.F90(968,9)' without being defined
Image              PC                Routine            Line        Source             
LIS                00000000051699C6  module_sf_noahmpl         869  module_sf_noahmplsm_401.F90
LIS                0000000004FAE6FD  module_sf_noahmpd         884  module_sf_noahmpdrv_401.F90
LIS                0000000006F486BA  noahmp_driver_401         749  noahmp_driver_401.F90
LIS                0000000001F899F8  noahmp401_main_           584  NoahMP401_main.F90
LIS                00000000015F453A  lsmrun_                   476  LIS_lsm_FTable.c
LIS                00000000015AE2B3  lis_lsmmod_mp_lis         619  LIS_lsmMod.F90
LIS                00000000018D3BAF  lis_surfacemodelm         259  LIS_surfaceModelMod.F90
LIS                00000000081AC281  retrospective_run         198  retrospective_runMod.F90
LIS                00000000018575E1  lisrun_                   202  LIS_runmode_FTable.c
LIS                0000000004A19843  MAIN__                    172  lisdrv.F90
LIS                000000000043220D  Unknown               Unknown  Unknown
libc-2.31.so       000015479EE3E24D  __libc_start_main     Unknown  Unknown
LIS                000000000043213A  Unknown               Unknown  Unknown

Line 968 of lis/surfacemodels/land/noahmp.4.0.1/phys/module_sf_noahmplsm_401.F90 is here:
https://github.com/NASA-LIS/LISF/blob/fix/pfio_minmax/lis/surfacemodels/land/noahmp.4.0.1/phys/module_sf_noahmplsm_401.F90#L968

Indeed, this variable is not defined above unless set to ".TRUE." for a conditional.

This subroutine was added by the LIS team to be able to provide Potential ET output from Noah-MP-4.0.1 LSM.

As we need this variable for NLDAS-3 and HydroGlobe outputs, I will fix it by adding this line:

FRZGRA = .FALSE.

..around Line 910 of this subroutine. Will re-compile and re-run today.

@jvgeiger
Copy link
Contributor Author

Good catch. Yes, that is unrelated to this pull request. We should open a new pull request with the bug fix. Thanks.

@dmocko
Copy link
Contributor

dmocko commented Feb 18, 2025

Good catch. Yes, that is unrelated to this pull request. We should open a new pull request with the bug fix. Thanks.

I will open a separate pull request for the FRZGRA undefined variable into the public-7.5 branch.

@dmocko
Copy link
Contributor

dmocko commented Feb 18, 2025

I compiled successfully with "strick-checks" turned on, but the model crashed again.

[borgk033:33437:0:33437] ib_mlx5_log.c:177  Transport retry count exceeded on mlx5_0:1/IB (synd 0x15 vend 0x81 hw_synd 0/0)
[borgk033:33437:0:33437] ib_mlx5_log.c:177  DCI QP 0x1961 wqe[158]: SEND s-e [rqpn 0x11158 rlid 230] [inl len 14] 
forrtl: error (76): Abort trap signal
Image              PC                Routine            Line        Source             
libpthread-2.31.s  000014C4378E2910  Unknown               Unknown  Unknown
libc-2.31.so       000014C436053D2B  gsignal               Unknown  Unknown
libc-2.31.so       000014C4360553E5  abort                 Unknown  Unknown
libucs.so.0.0.0    000014C42AF1D016  Unknown               Unknown  Unknown
libucs.so.0.0.0    000014C42AF21FAB  ucs_log_default_h     Unknown  Unknown
libucs.so.0.0.0    000014C42AF222B1  ucs_log_dispatch      Unknown  Unknown
libuct_ib.so.0.0.  000014C42A625E30  uct_ib_mlx5_compl     Unknown  Unknown
libuct_ib.so.0.0.  000014C42A651F79  uct_dc_mlx5_ep_ha     Unknown  Unknown
libuct_ib.so.0.0.  000014C42A627A25  uct_ib_mlx5_check     Unknown  Unknown
libuct_ib.so.0.0.  000014C42A653BB7  Unknown               Unknown  Unknown
libucp.so.0.0.0    000014C42BA44C9A  ucp_worker_progre     Unknown  Unknown
libmlx-fi.so       000014C42BE0ABB1  Unknown               Unknown  Unknown
libmlx-fi.so       000014C42BE25ACD  Unknown               Unknown  Unknown
libmlx-fi.so       000014C42BE24CA2  Unknown               Unknown  Unknown
libmpi.so.12.0.0   000014C4383825BF  Unknown               Unknown  Unknown
libmpi.so.12.0.0   000014C4380B161E  Unknown               Unknown  Unknown
libmpi.so.12.0.0   000014C437D50331  Unknown               Unknown  Unknown
libmpi.so.12.0.0   000014C437D4F6D0  Unknown               Unknown  Unknown
libmpi.so.12.0.0   000014C437E6CA15  Unknown               Unknown  Unknown
libmpi.so.12.0.0   000014C437E3E7A1  Unknown               Unknown  Unknown
libmpi.so.12.0.0   000014C437E1CCE3  Unknown               Unknown  Unknown
libmpi.so.12.0.0   000014C437F92DAA  Unknown               Unknown  Unknown
libmpi.so.12.0.0   000014C437D83917  MPI_Bcast             Unknown  Unknown
libmpifort.so.12.  000014C4362D6D74  pmpi_bcast            Unknown  Unknown
LIS                0000000000F01443  lis_coremod_mp_li         520  LIS_coreMod.F90
LIS                00000000081AC01D  retrospective_run         188  retrospective_runMod.F90
LIS                00000000018575E1  lisrun_                   202  LIS_runmode_FTable.c
LIS                0000000004A19843  MAIN__                    172  lisdrv.F90
LIS                000000000043220D  Unknown               Unknown  Unknown
libc-2.31.so       000014C43603E24D  __libc_start_main     Unknown  Unknown
LIS                000000000043213A  Unknown               Unknown  Unknown

Line 520 of LIS_coreMod.F90 is here:
https://github.com/NASA-LIS/LISF/blob/fix/pfio_minmax/lis/core/LIS_coreMod.F90#L520

I'm not sure why the model is crashing here. It may be a memory issue triggered due to "strick-checks" being on.

I will try again with the default "2" optimization option during configure.

@dmocko
Copy link
Contributor

dmocko commented Feb 18, 2025

The code ran with optimization "2" and produced output.

  1. Can the code possibly be re-written so the variable name is (for example)?
    AvgSurfT_min
    instead of:
    AvgSurfT_tavg_min

    The max/min option is separate from the time-averaging option in the MODEL OUTPUT TBL.
    The shorter name is actually less confusing, and works better with visualization/analysis packages.

  2. The code produced large positive or negative values over water points. I ran with "Openwater" turned on.

    The "_min" field has values of -999999 over water and the "_max" field has values of 999999 over water.

    Please make the values undefined over water points.

avgsurft_min
avgsurft_max

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
eis-freshwater2 Changes related to EIS Freshwater project HighPriority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants