Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for GNU hystpdf crash #899

Merged
merged 3 commits into from
Feb 21, 2024
Merged

Conversation

mathomp4
Copy link
Member

@mathomp4 mathomp4 commented Feb 9, 2024

In running GFDL + NH + GF2020 + RRTMGP + GOCART Data runs for @FlorianDeconinck, it was found that GNU runs would fail at C180+ with:

[borgl058:189261:0:189261] Caught signal 8 (Floating point exception: floating-point overflow)
==== backtrace (tid: 189261) ====
 0 0x0000000000016910 __funlockfile()  ???:0
 1 0x0000000001585005 __geosmoist_process_library_MOD_hystpdf()  /discover/nobackup/projects/gmao/SIteam/Models/GEOSgcm-v11.5.1-GNU-SLES15/GEOSgcm/src/Components/@GEOSgcm_GridComp/GEOSagcm_G
ridComp/GEOSphysics_GridComp/GEOSmoist_GridComp/Process_Library.F90:2064
 2 0x00000000016f63e0 __geos_gfdl_1m_interfacemod_MOD_gfdl_1m_run()  /discover/nobackup/projects/gmao/SIteam/Models/GEOSgcm-v11.5.1-GNU-SLES15/GEOSgcm/src/Components/@GEOSgcm_GridComp/GEOSag
cm_GridComp/GEOSphysics_GridComp/GEOSmoist_GridComp/GEOS_GFDL_1M_InterfaceMod.F90:619
 3 0x000000000151e14b __geos_moistgridcompmod_MOD_run()  /discover/nobackup/projects/gmao/SIteam/Models/GEOSgcm-v11.5.1-GNU-SLES15/GEOSgcm/src/Components/@GEOSgcm_GridComp/GEOSagcm_GridComp/
GEOSphysics_GridComp/GEOSmoist_GridComp/GEOS_MoistGridComp.F90:5485
...

Intel runs never had this issue.

The traceback points to this code:

QAx = 0.0
if (CLCN > 0.0) QAx = (QLCN+QICN)/CLCN

As the failure is an overflow, the thought is that we have very small or denormal CLCN such that division blows out the 32-bit real. So the proposed solution is to compare against tiny rather than just 0:

                            QAx = 0.0
      if (CLCN > tiny(0.0)) QAx = (QLCN+QICN)/CLCN

Tests show this seems to fix the issue at C180 for GNU.

Tests with this fix at c24 and c48 with BACM_1M show this is zero-diff. I can't always assume this is the case, but it seems enough proof for me to label 0-diff in the sense that "Intel + Release + BACM + c48" is zero-diff.

I'll add @wmputman as a reviewer as he is the first person I think of in re this code. Though maybe @narnold1 might be better? 🤷🏼

@mathomp4 mathomp4 added 0 diff The changes in this pull request have verified to be zero-diff with the target branch. Non 0-diff The changes in this pull request are non-zero-diff labels Feb 9, 2024
@mathomp4 mathomp4 requested a review from wmputman February 9, 2024 17:53
@mathomp4 mathomp4 self-assigned this Feb 9, 2024
@mathomp4 mathomp4 removed the Non 0-diff The changes in this pull request are non-zero-diff label Feb 15, 2024
@mathomp4 mathomp4 marked this pull request as ready for review February 15, 2024 15:39
@mathomp4 mathomp4 requested a review from a team as a code owner February 15, 2024 15:39
@sdrabenh sdrabenh merged commit 8f182b0 into develop Feb 21, 2024
2 of 4 checks passed
@sdrabenh sdrabenh deleted the bugfix/mathomp4/hystpdf-gnu-fix branch February 21, 2024 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 diff The changes in this pull request have verified to be zero-diff with the target branch.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants