-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix out-of-bounds access in module_diag_hailcast.F90 which crashes RRFS on WCOSS2. #308
Conversation
This is a draft because I discovered I used the wrong fix. I need to do this instead: - DO k=KBAS,nz
+ DO k=KBAS+1,nz
RWA_new(k) = RWA_new(k) / (h1d(k)-h1d(k-1))
ENDDO I'm testing this now on WCOSS2 and Hera. |
The correct fix works too. This is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the old code-snippet that was commented out and left in place, this looks like the appropriate fix. Though it might be good to double check with the original author of this module (@ywangwof) to be sure.
Indeed, I would like the original author to take a look at this. |
@SamuelTrahanNOAA Your fix look good. Actually, I have a pull request (#258) to address this problem long time ago. I do not know how to push forward for its merging. It is because I do not have write permission and am also not familiar with the merging process. Anyway, if you can merge that pull request with this one, it will be the best. Note that the variable |
@ywangwof I already have PRs for fv3atm and ufs-weather-model, so it is best if we merge any fixes into my PR. Otherwise, someone else will have to redo that extra work. @lharris4 disagreed with your solution and wanted this instead: https://github.com/NOAA-GFDL/GFDL_atmos_cubed_sphere/pull/258/files#r1160886041
Regardless, you also need my fix. Otherwise, the buggy loop will apply the correction to one extra index. DO k=1,nz
...
ELSE IF ((k.ge.KBAS+1).AND.(RWA_adiabat(k).ge.1.E-12)) THEN
RWA_new(k) = RWA_adiabat(k)*(h1d(k)-h1d(k-1)) - RWA_new(k-1)
IF (RWA_new(k).LT.0) RWA_new(k) = 0.
ENDIF
ENDDO
...
DO k=KBAS,nz
RWA_new(k) = RWA_new(k) / (h1d(k)-h1d(k-1))
ENDDO Only indices |
Do you read my reply to @lharris4 's comments?
I did not get it. The original out-of-bound problem is caused by |
Your code fails unit analysis. This comment says h1d has a unit. Specifically, meters. For the equations to make sense, the units must be consistent. !!!! h1d height above sea level (m) RWA_new indices from KBAS+1 onward are scaled by h1d(k)-h1d(k-1). AT k=KBAS, it is not scaled: IF (k.eq.KBAS) THEN
RWA_new(k) = RWA_adiabat(k)
ELSE IF ((k.ge.KBAS+1).AND.(RWA_adiabat(k).ge.1.E-12)) THEN
RWA_new(k) = RWA_adiabat(k)*(h1d(k)-h1d(k-1)) - RWA_new(k-1)
ENDIF The next loop removes that scaling: DO k=KBAS,nz
RWA_new(k) = RWA_new(k) / (h1d(k)-h1d(k-1))
ENDDO It does so for KBAS, which has not been scaled. Actually, now that I look closer, your units are inconsistent earlier. At IF (k.eq.KBAS) THEN
RWA_new(k) = RWA_adiabat(k)
! k==KBAS has units of RWA_adiabat
ELSE IF ((k.ge.KBAS+1).AND.(RWA_adiabat(k).ge.1.E-12)) THEN
RWA_new(k) = RWA_adiabat(k)*(h1d(k)-h1d(k-1)) - RWA_new(k-1)
! Units at k=KBAS+1 are inconsistent
! RWA_new(k) = RWA_adiabat*meters + RWA_adiabat = RWA_adiabat * (meters + unitless)
ENDIF |
Thanks @SamuelTrahanNOAA. Now I see the problem. So even with your fix, it still fails the unit analysis. I will propose a fix like the following together with my original fix to limit
BTW, the original author of this code is John Henderson. I just re-organized it for the UFS framework and made a few tests. I should NOT claim it is my code. Anyway, I do not know John's Github ID, but I will sent him an email to remind him to check the correctness of our fixes. Thanks again. |
@ywangwof - Can you describe the physical phenomena that can lead to KBAS=1? We're debugging failures of the RRFS model which may be coming from bad initial conditions. Another scheme failed recently due to implausible near-surface conditions. If we know what to look for, it may pare down the possible causes. |
Also, that fix still isn't enough. The initial value of RWA_new is also in the wrong units: RWA_new(k) = RWA(k) It is only replaced later if RWA_adiabat(k) is not near zero: ELSE IF ((k.ge.KBAS+1).AND.(RWA_adiabat(k).ge.1.E-12)) THEN
RWA_new(k) = RWA_adiabat(k)*(h1d(k)-h1d(k-1)) - RWA_new(k-1) If I knew what the code was trying to do with RWA_new and RWA_adiabat, I could fix this. Alas, there isn't sufficient commenting for me to know without reading through several research papers. |
I'm the original author of the HAILCAST code. John Henderson did the work of implementing it within FV3, but since this issue is well within the HAILCAST subroutine itself, I can help here. The change should be
The k=1,nz do loop at line 809 needs to remain the same. RWA, RWA_new, and RWA_adiabat are all cloud liquid water mixing ratio profiles. RWA comes directly from the model. RWA_adiabat is a mixing ratio profile assuming there is no entrainment. It assumes the cloud water available at cloud base (RBAS), is only reduced aloft by bringing the air to saturation (RSA), or glaciation (icefactor). (Lines 815-834). RWA_new is, essentially, RWA below cloud base, and RWA_adiabat at cloud base and above it, with the caveat that any moisture expended at height levels below the current one can't be included in RWA_adiabat. (Lines 837-842). At cloud base (k=KBAS), no moisture has been expended yet, so RWA_adiabat can't be multiplied by height levels at that point. Above cloud base, (k>=KBAS+1), the amount of moisture expended over a layer will be linearly dependent on the depth of the layer. In terms of units, after line 842:
Thus, the only points where the height scaling (and extra units of m) need to be removed are from k=KBAS+1:nz, where RWA_adiabat(k).ge.1.E-12. The code changes I've pasted above account for that. (The extra check for RWA_adiabat is going to make very little difference numerically, but is good to add for physical completeness.) |
Thanks @adams-selin. Just being curious, in what condition KBAS=1? clear sky or bad model states? |
This means you have a unit mismatch at RWA_new(KBAS+1) RWA_new(k) = RWA_adiabat(k)*(h1d(k)-h1d(k-1)) - RWA_new(k-1)
RWA_new(KBAS+1) = RWA_adiabat(k)*(h1d(KBAS+1)-h1d(KBAS)) - RWA_new(KBAS)
RWA_new(KBAS+1) units = kg/kg * m - kg/kg = m - 1
RWA_new(KBAS+1) units = 1 * m - 1
RWA_new(KBAS+1) units = m - 1 You are subtracting a unitless number from meters. That propagates onward through the rest of the k indexes up to nz. |
When you have a cloud essentially on the ground. Rare, but I could potentially see it being possible with a low LCL, depending on how the model layer spacing is near the surface. |
@SamuelTrahanNOAA, yes, you are correct there is a units mismatch there, depending on how you view it. However, I'm viewing the RWA_new(KBAS) term as a constant in the integration, where the summation over height levels is the integration. This makes sense physically to me, if you imagine trying to calculate the amount of water retained at each point in a leaky box, with input into one end of the box by a hose. RWA_new(KBAS) is the amount of water coming in the box from the hose. RWA_new(KBAS+1) is the amount of water retained at several points along the length of the box, which is dependent on the amount of water that has leaked out before reaching that point....which is dependent on the length of the box over which the water has traveled to that point. I think the units might not match up because I should be, technically, calculating a moisture flux from RBAS over a set unit of time. But that would need an updraft speed and more, so I'm just assuming the moisture input is constant. |
The unit mismatch is still a unit mismatch no matter how you view it. Your constant of integration isn't constant. If you want it to be constant, you must multiply it by dh at every level before dividing by dh at the end. After your integration, if all RWA_adiabat(k).ge.1.E-12, using pseudocode,
Reduces to
Which means:
I don't have the equation you're trying to solve, but I'm guessing it is this: RWA_new(z) = RWA(z) - integrate(d(RWA_adiabat(z))/dz where RWA_adiabat(z) > 1e-12, z=zbas...zend) The point being, to remove cloud liquid water output at lower levels. There's some safeguards against removing negative moisture or dealing with near-zero values. If so, the correct code would be: DO k=1,nz
RWA_lower_dh(k) = 0
...
IF ((k.ge.KBAS+1).AND.(RWA_adiabat(k).ge.1.E-12)) THEN
! Integrate RWA_adiabat from KBAS...k into RWA_lower_dh
RWA_lower_dh(k) = MAX(0, RWA_adiabat(k)*(h1d(k)-h1d(k-1))) + RWA_lower_dh(k-1)
ENDIF
ENDDO
DO k=KBAS+1,nz
! Subtract the integration from RWA to get RWA_new
RWA_new(k) = RWA(k) - max(0, RWA_lower_dh(k) / (h1d(k)-h1d(KBAS)) + RWA_adiabat(KBAS))
ENDDO |
It's not a unit mismatch because at the cloud base only, (KBAS), RBAS is technically an amount of moisture that has been fluxed in across cloud base. I'm assuming an constant updraft speed of 1 m/s, a constant source RBAS mixing ratio of units kg/kg, and that it is happening over 1 sec of time. Normally one would think of fluxes in 3D, but remember we are in a 1D column here. That means, at the cloud base only, RBAS has units (kg kg-1)(m s-1)(s), or (kg kg-1)(m). Since, at the cloud base, RWA_new(KBAS) = RWA_adiabat(KBAS) = RWA(KBAS) = RBAS, that means RWA_new(KBAS) also has units of (kg kg-1)(m). Once we get above KBAS, however, that no longer holds true, and RWA_adiabat is in units of (kg kg-1) and needs the height multiplier. Hence the height factor removal only needs to be done to those points at KBAS+1 and above. I will note I included this exact fix in a pull request for WRF that was approved ~6 months ago (wrf-model/WRF#1877). |
@adams-selin - Your explanation satisfies me, and I had already changed my PR to match your prior comment. Can you confirm my PR contains the right fix? Once you confirm, I'll take the ufs-weather-model PR out of draft status so they can proceed with testing and merging. |
@adams-selin @SamuelTrahanNOAA - because of the discussion surrounding this logic, I'd like to see the relevant information be documented within the source code. Can you also ensure proper provenance (ownership/history) for the file? |
Quite sensible. I'd like @adams-selin's permission before putting their name in the file. Also, @adams-selin, can you provide an updated version of that region with explanatory comments? Otherwise, I'll give it my best try. |
@SamuelTrahanNOAA Sure. The code in the PR request is correct. Here are the commenting changes I'd make:
|
@adams-selin @ywangwof I am unable to officially add you as a reviewer on this PR, but would you please leave a comment stating if you approve these changes? |
Yes, I approve these changes. Thanks. |
@jkbk2004 This PR was discussed in today's UFS code management meeting and I think it is ready to merge as we have approval from the original author of the code now. |
@laurenchilutti Thanks for the note! I will follow up next week. @SamuelTrahanNOAA @hu5970 FYI |
All tests are done at ufs-community/ufs-weather-model#2064. @bensonr can you merge this pr? |
Description
Fixes a bug that can crash the RRFS ensembles. When KBAS=1, there's an out-of-bounds write in an array. That corrupts memory and occasionally crashes the model. Also updates the comments in that region of code and adds author information.
Fixes #309
All changes are from @adams-selin
How Has This Been Tested?
Reran the failed case on Hera and WCOSS2 Dogwood. It passed. Also ran the regression tests on Hera.
Checklist:
Please check all whether they apply or not