You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The RRFS team has encountered several model crashes over the past several days. After running the model in debug mode we think we have traced the source of the bug to aerinterp.F90. We have a possible solution for the bug, described below.
The bug: Occurs in the subroutine aerinterpol with an array out of bounds issue on line 435 with the second index of the aerpres array. Here is the error:
forrtl: severe (408): fort: (2): Subscript #2 of the array AERPRES has value 503 which is greater than the upper bound of 72
It appears that this array re-uses the indices of i1 and i2. These indices used previously in the same routine and were defined by values that can have much larger numbers than 72 (see lines 392-395). Specifically it appears they take on values that are likely the size of the horizontal dimension of the MERRA file (nlons*nlats, which is 576*361) and not related to the vertical dimension (which is up to 72). We noticed there is an edge case where i1 and i2 may not get re-assigned later in the code, and hence a value larger than 72 may make it to line 435. We think this is causing the problem. Jili Dong found the following issue associated with an if-statement beginning with a loop on line 428:
DO k=1, levsaer-1 !! from sfc to toa
IF(prsl(j,L) < aerpres(j,k) .and. prsl(j,L)>aerpres(j,k+1)) then
i1 = k
i2 = min(k+1,levsaer)
exit
ENDIF
ENDDO
Here is how Jili explains it: Note that when searching for the k level where prsl falls within aerpres, the code uses < or >. If prsl equals exactly to aerpres(i,k) or aerpres (j,k+1), this if block will be skipped. If the block is skipped then line 435 will use indice values that we set for a different purpose earlier in the code.
The proposed solution from Jili:
Should this "if" statement be changed to the following?
IF(prsl(j,L) <= aerpres(j,k) .and. prsl(j,L)>aerpres(j,k+1)) then
Note that Jili has just changed the < to <=. We have tested this and so far it has fixed three crashes that we've had. So it seems to be working so far.
Steps to Reproduce
Please provide detailed steps for reproducing the issue.
Run RRFS cycle 202401080000 in debug mode.
Additional Context
Please provide any relevant information about your setup. This is important in case the issue is not reproducible except for under certain conditions.
Machine: WCOSS2
Compiler:
Suite Definition File or Scheme: RRFS
Reference other issues or PRs in other repositories that this is related to, and how they are related.
Output
See above
The text was updated successfully, but these errors were encountered:
Description
The RRFS team has encountered several model crashes over the past several days. After running the model in debug mode we think we have traced the source of the bug to aerinterp.F90. We have a possible solution for the bug, described below.
The bug: Occurs in the subroutine aerinterpol with an array out of bounds issue on line 435 with the second index of the aerpres array. Here is the error:
It appears that this array re-uses the indices of
i1
andi2
. These indices used previously in the same routine and were defined by values that can have much larger numbers than 72 (see lines 392-395). Specifically it appears they take on values that are likely the size of the horizontal dimension of the MERRA file (nlons*nlats, which is 576*361) and not related to the vertical dimension (which is up to 72). We noticed there is an edge case wherei1
andi2
may not get re-assigned later in the code, and hence a value larger than 72 may make it to line 435. We think this is causing the problem. Jili Dong found the following issue associated with an if-statement beginning with a loop on line 428:Here is how Jili explains it: Note that when searching for the k level where prsl falls within aerpres, the code uses
<
or>
. Ifprsl
equals exactly toaerpres(i,k)
oraerpres (j,k+1)
, this if block will be skipped. If the block is skipped then line 435 will use indice values that we set for a different purpose earlier in the code.The proposed solution from Jili:
Should this "if" statement be changed to the following?
IF(prsl(j,L) <= aerpres(j,k) .and. prsl(j,L)>aerpres(j,k+1)) then
Note that Jili has just changed the
<
to<=
. We have tested this and so far it has fixed three crashes that we've had. So it seems to be working so far.Steps to Reproduce
Please provide detailed steps for reproducing the issue.
Run RRFS cycle 202401080000 in debug mode.
Additional Context
Please provide any relevant information about your setup. This is important in case the issue is not reproducible except for under certain conditions.
Output
See above
The text was updated successfully, but these errors were encountered: