-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calls to outfld in atm that create temp arrays hit invalid float (when pcols != ncols) Intel/debug/cori-knl #1263
Comments
My suggested change does not hide another issue because those array locations should not be used by anyone. Almost all of the physics code is loops from |
OK, so just to be clear, it should be OK (and suggested) to change all outfld calls to use ncol? |
Yes for any call that is currently using pcols. There are a few that are different (e.g., micro_mg_cam.F90) and you should not change those. |
The way that I read this, niimm_bc+niimm_dst is doing arithmetic on arrays that contain uninitialized values. The other calls are not doing any calculation. I am concerned with the proposed fix: call outfld('NIMIX_CNT', nicnt_bc(:ncol,:)+nicnt_dst(:ncol,:), ncol, lchnk) as I am not sure that the declared first dimention of the generated temporary array would be. The ncol argument in outfld represents the declared first dimension in the 2D array. |
So, I disagree with @gold2718 - the array declarations in outfld are real(r8), intent(in) :: field(idim,*) ! Array containing field values If the field is declared as (pcols, pver), then the parameter idim needs to be pcols, doesn't it? |
Yes which is why it is important to pass the field as |
Thanks. So we should always be creating temporary arrays in these calls, not just passing pointers? Guess for chunks this won't cause memory issues. |
Note that this does cause performance problems in threaded regions for the PGI compiler, and is less efficient even for the Intel compiler. Outfld calls typically are not in inner loops (at least, I hope not), so probably irrelevant. |
Too bad outfld does not have both actual and declared first dimensions in the call. Would eliminate the need to generate temporary arrays, though this would not have affected @ndkeen 's error since it was adding these fields before the call to outfld. |
outfld calls are never supposed to be in the inner loop but only at the parameterization interface level or higher.
This should always work, no? |
Yes, one of these days, I will rewrite the history processing to use real buffers so that the outfld interface will take proper Fortran arrays (not FORTRAN 77 as it does currently). |
I thought that this is what you just pointed out to me would not work? A temporary array is created with the dimensions (:ncol,:) that is then passed in, so the declared dimension should be ncols? |
Gaa, I shouldn't type when I'm tired. I fixed the example and your comment is correct. Where possible, work arrays should be defined with dimension |
I'm late to the game here, but I wonder if there are two separate issues - a potential refactor and fixing this particular bug. My preference is not to do array arithmetic as part of a subroutine call parameter. While "uglier", doing the computation outside the call with an explicitly declared work array, and then passing in this work array does not depend on the compiler to be doing the right thing (functionally and performance-wise). This array could be declared with ncols, and then the choice of outfld array dimension parameters becomes obvious. Or it could be declared with pcols, and initialized appropriately, and so be consistent with all of the other calls in this section of code (until a refactor happens). |
I second Pat's recommendation. |
I was going to reply to this with a fix to see if both of you agreed. But I'm also seeing a different problem in a call to oufld(). It's still an acme_dev test, and on cori-knl with DEBUG Intel, but it's a floating overflow and in clubb_intr.F90.
Is it possible that it's still a similar issue: there are invalid values in the (unused part of) array that are multiplied by 1000? |
@ndkeen, This is very likely the same problem with the same solution. |
OK. Have not made progress on this -- was planning on doing it today. |
In case it is not clear (lots of suggestions above), my suggestion is to declare and use a work array, either (a)
or
for each of the problematic outfld calls (any that invole array arithmetic in the call). (b) is simpler, but retains the style that @gold2718 doesn't like. |
I am not sure how frequently these outfld calls are executed. If they are executed every time step (or often), won't the solution prescribed in (a) above fragment the memory with repeated "allocate" "deallocate" calls? |
I also vote for (b). for temp variables inside subroutines, it's much more efficient to create them on the stack than it is to allocate them. If the allocation is inside a threaded region (not sure if that is true here), allocate is especially slow. |
Only reason (a) requires allocate/deallocate is that ncols isn't available to be used for declarations. Probably not worth adding this as a parameter to the subroutine call just to enable this option. |
|
@gold2718 - thanks. My mistake. I didn't look at the code closely enough. |
So, (a) should be replaced by (c)
|
I started a branch |
@gold2718 or @worleyph, can you look at those changes in this branch to see if this is what you had in mind? Obv I will remove the comments. I made the "tmp_array fix" to several outfld calls, even if they weren't causing an immediate problem. Or I could make a PR and someone could review that instead. ? |
@ndkeen , I think that you want to use
if you are going to use ncol in
Otherwise you still have a pcol-dimensioned array but are telling outfld that this is an ncol-dimensioned array. |
and
in the other routine. |
got it -- changed both of those and retesting |
Update: I got caught up with other issues, but wanted to follow-through with this. I will try it again with master, make same changes, etc. |
Give users more info and control over key run commands Adds new tool 'preview_run' to give users a preview of what their batch submissions and mpirun will look like. Adds new override XML variables so users can exercise direct control over there batch and mpirun cmds if they want (advanced users only). Test suite: scripts_regression_tests --fast Test baseline: Test namelist changes: Test status: bit for bit Fixes #1263 User interface changes?: Signficant: new preview_run tool and new XML variables Code review: Many
And after module & PE layout adjustments, one of the tests ( |
For several outfld() calls that do array operations in the function call, instead create a temporary array, and only do the operation on ncol values. For example: !call outfld( 'RTP2_CLUBB', rtp2*1000._r8, pcols, lchnk ) tmp_array = rtp2(:ncol,:)*1000._r8 call outfld( 'RTP2_CLUBB', tmp_array, ncol, lchnk ) We hit issues using a couple of different PE layouts in these outfld calls. Additionally, for cori-knl, the SMS_D_Ld1.ne16_ne16.FC5ATMMOD test was failing, but now passes. Fixes #1263
For several outfld() calls that do array operations in the function call, instead create a temporary array, and only do the operation on ncol values. For example: !call outfld( 'RTP2_CLUBB', rtp2*1000._r8, pcols, lchnk ) tmp_array = rtp2(:ncol,:)*1000._r8 call outfld( 'RTP2_CLUBB', tmp_array, ncol, lchnk ) We hit issues using a couple of different PE layouts in these outfld calls. Additionally, for cori-knl, the SMS_D_Ld1.ne16_ne16.FC5ATMMOD test was failing, but now passes. Fixes #1263
I had in both places trying to see if that fixes issues we are now seeing |
I allocated the tmp_array to be too large and GNU compiler stopped with an array mismatch. Change this to be state%ncol in size, not pcol
Fixes tmp_array size to avoid array mismatch found by GNU This is a fix to a PR #1501 (github issue #1263). I allocated the tmp_array to be too large and GNU compiler stopped with an array mismatch. Change this to be state%ncol in size, not pcol Fixes #1263 [BFB] * ndk/cam/fix-add-tmp-arrays-use-ncol: This is a fix to a PR #1501 (github issue #1263). I allocated the tmp_array to be too large and GNU compiler stopped with an array mismatch. Change this to be state%ncol in size, not pcol Add a more reasonable pe layout for the a%ne30np4 grid for cori-knl. May not be optimal, but allows for testing.
Fixes tmp_array size to avoid array mismatch found by GNU This is a fix to a PR #1501 (github issue #1263). I allocated the tmp_array to be too large and GNU compiler stopped with an array mismatch. Change this to be state%ncol in size, not pcol Fixes #1263 [BFB] * ndk/cam/fix-add-tmp-arrays-use-ncol: This is a fix to a PR #1501 (github issue #1263). I allocated the tmp_array to be too large and GNU compiler stopped with an array mismatch. Change this to be state%ncol in size, not pcol Add a more reasonable pe layout for the a%ne30np4 grid for cori-knl. May not be optimal, but allows for testing.
Upstream merge to resolve conflicts. * master: (93 commits) Adds the --force-move option and implies --copy-only when --last-date is specified without --force-move added spunup A_WCYCL options (1850S and 2000S) for v0atm Updates land initial conditions for ne120_oRRS18v3 Fix jenkins_generic_job when user selects specific machines. Fix small problems with MPAS components from debug tests Adds a warning when using the --last-date option and to its help Implement the copy_only option for short term archiving. This copies files rather than moving them Implemented most of the machinery for testing with "incomplete" log files Fix code format issue - replace unused variable with _ Update template.st_archive Adds options to the st_archive to specify the last date (--last-date) to archive, and whether to disable archiving incomplete log files (--no-incomplete-logs) Turn off salinity restoring by default until it passes exact restart This is a fix to a PR #1501 (github issue #1263). I allocated the tmp_array to be too large and GNU compiler stopped with an array mismatch. Change this to be state%ncol in size, not pcol Fix a floating invalid that we found with certain PE's (and with intel v17.02) bug fix Revert mpas-o commit with new threaded vector reconstruction consistency changes Changing jobmin to 1 for batch q on blues Modifies SMS_D_Ln5_P8x4 to SMS_Ln5 to avoid cetus and melvin issues Point at corrected salinity restoring files for oEC60to30v3 and oRRS18to6v3 Make sure clean clm cleans up obj dir ...
Upstream merge to resolve conflicts. * master: (93 commits) Adds the --force-move option and implies --copy-only when --last-date is specified without --force-move added spunup A_WCYCL options (1850S and 2000S) for v0atm Updates land initial conditions for ne120_oRRS18v3 Fix jenkins_generic_job when user selects specific machines. Fix small problems with MPAS components from debug tests Adds a warning when using the --last-date option and to its help Implement the copy_only option for short term archiving. This copies files rather than moving them Implemented most of the machinery for testing with "incomplete" log files Fix code format issue - replace unused variable with _ Update template.st_archive Adds options to the st_archive to specify the last date (--last-date) to archive, and whether to disable archiving incomplete log files (--no-incomplete-logs) Turn off salinity restoring by default until it passes exact restart This is a fix to a PR #1501 (github issue #1263). I allocated the tmp_array to be too large and GNU compiler stopped with an array mismatch. Change this to be state%ncol in size, not pcol Fix a floating invalid that we found with certain PE's (and with intel v17.02) bug fix Revert mpas-o commit with new threaded vector reconstruction consistency changes Changing jobmin to 1 for batch q on blues Modifies SMS_D_Ln5_P8x4 to SMS_Ln5 to avoid cetus and melvin issues Point at corrected salinity restoring files for oEC60to30v3 and oRRS18to6v3 Make sure clean clm cleans up obj dir ...
…ext (PR #1637) Currently, unless you explicitly specify a custom PE layout on Edison, F-compset runs will default to 192 tasks + 4 threads (=32 total nodes), yielding <2 SYPD. This PR specifies a reasonable default for ne30 F compsets. Note that what is proposed here isn't an optimal layout. Hyperthreading can increase SYPD by 1 or more, but it isn't turned on by default any more. This layout was chosen simply because it works and isn't horrible. just allows newbies or people who forget to specify the PE layout to get something that isn't totally useless. Efforts to get a faster PE layout would be useful. Note that this is a second attempt to do this. #1241 also proposed a similar F PE layout but failed to pass tests for unrelated issues documented here: #1263 (measured 10.9 sypd after edison upgrade)
Upstream merge to resolve conflicts. * master: (93 commits) Adds the --force-move option and implies --copy-only when --last-date is specified without --force-move added spunup A_WCYCL options (1850S and 2000S) for v0atm Updates land initial conditions for ne120_oRRS18v3 Fix jenkins_generic_job when user selects specific machines. Fix small problems with MPAS components from debug tests Adds a warning when using the --last-date option and to its help Implement the copy_only option for short term archiving. This copies files rather than moving them Implemented most of the machinery for testing with "incomplete" log files Fix code format issue - replace unused variable with _ Update template.st_archive Adds options to the st_archive to specify the last date (--last-date) to archive, and whether to disable archiving incomplete log files (--no-incomplete-logs) Turn off salinity restoring by default until it passes exact restart This is a fix to a PR #1501 (github issue #1263). I allocated the tmp_array to be too large and GNU compiler stopped with an array mismatch. Change this to be state%ncol in size, not pcol Fix a floating invalid that we found with certain PE's (and with intel v17.02) bug fix Revert mpas-o commit with new threaded vector reconstruction consistency changes Changing jobmin to 1 for batch q on blues Modifies SMS_D_Ln5_P8x4 to SMS_Ln5 to avoid cetus and melvin issues Point at corrected salinity restoring files for oEC60to30v3 and oRRS18to6v3 Make sure clean clm cleans up obj dir ...
Upstream merge to resolve conflicts. * master: (93 commits) Adds the --force-move option and implies --copy-only when --last-date is specified without --force-move added spunup A_WCYCL options (1850S and 2000S) for v0atm Updates land initial conditions for ne120_oRRS18v3 Fix jenkins_generic_job when user selects specific machines. Fix small problems with MPAS components from debug tests Adds a warning when using the --last-date option and to its help Implement the copy_only option for short term archiving. This copies files rather than moving them Implemented most of the machinery for testing with "incomplete" log files Fix code format issue - replace unused variable with _ Update template.st_archive Adds options to the st_archive to specify the last date (--last-date) to archive, and whether to disable archiving incomplete log files (--no-incomplete-logs) Turn off salinity restoring by default until it passes exact restart This is a fix to a PR #1501 (github issue #1263). I allocated the tmp_array to be too large and GNU compiler stopped with an array mismatch. Change this to be state%ncol in size, not pcol Fix a floating invalid that we found with certain PE's (and with intel v17.02) bug fix Revert mpas-o commit with new threaded vector reconstruction consistency changes Changing jobmin to 1 for batch q on blues Modifies SMS_D_Ln5_P8x4 to SMS_Ln5 to avoid cetus and melvin issues Point at corrected salinity restoring files for oEC60to30v3 and oRRS18to6v3 Make sure clean clm cleans up obj dir ...
Upstream merge to resolve conflicts. * master: (93 commits) Adds the --force-move option and implies --copy-only when --last-date is specified without --force-move added spunup A_WCYCL options (1850S and 2000S) for v0atm Updates land initial conditions for ne120_oRRS18v3 Fix jenkins_generic_job when user selects specific machines. Fix small problems with MPAS components from debug tests Adds a warning when using the --last-date option and to its help Implement the copy_only option for short term archiving. This copies files rather than moving them Implemented most of the machinery for testing with "incomplete" log files Fix code format issue - replace unused variable with _ Update template.st_archive Adds options to the st_archive to specify the last date (--last-date) to archive, and whether to disable archiving incomplete log files (--no-incomplete-logs) Turn off salinity restoring by default until it passes exact restart This is a fix to a PR #1501 (github issue #1263). I allocated the tmp_array to be too large and GNU compiler stopped with an array mismatch. Change this to be state%ncol in size, not pcol Fix a floating invalid that we found with certain PE's (and with intel v17.02) bug fix Revert mpas-o commit with new threaded vector reconstruction consistency changes Changing jobmin to 1 for batch q on blues Modifies SMS_D_Ln5_P8x4 to SMS_Ln5 to avoid cetus and melvin issues Point at corrected salinity restoring files for oEC60to30v3 and oRRS18to6v3 Make sure clean clm cleans up obj dir ...
Upstream merge to resolve conflicts. * master: (93 commits) Adds the --force-move option and implies --copy-only when --last-date is specified without --force-move added spunup A_WCYCL options (1850S and 2000S) for v0atm Updates land initial conditions for ne120_oRRS18v3 Fix jenkins_generic_job when user selects specific machines. Fix small problems with MPAS components from debug tests Adds a warning when using the --last-date option and to its help Implement the copy_only option for short term archiving. This copies files rather than moving them Implemented most of the machinery for testing with "incomplete" log files Fix code format issue - replace unused variable with _ Update template.st_archive Adds options to the st_archive to specify the last date (--last-date) to archive, and whether to disable archiving incomplete log files (--no-incomplete-logs) Turn off salinity restoring by default until it passes exact restart This is a fix to a PR #1501 (github issue #1263). I allocated the tmp_array to be too large and GNU compiler stopped with an array mismatch. Change this to be state%ncol in size, not pcol Fix a floating invalid that we found with certain PE's (and with intel v17.02) bug fix Revert mpas-o commit with new threaded vector reconstruction consistency changes Changing jobmin to 1 for batch q on blues Modifies SMS_D_Ln5_P8x4 to SMS_Ln5 to avoid cetus and melvin issues Point at corrected salinity restoring files for oEC60to30v3 and oRRS18to6v3 Make sure clean clm cleans up obj dir ...
A test in acme_developer
SMS_D_Ln1.ne30_oEC.F1850C5AV1C-02
.While adjusting PE layouts, I hit the error below. I assumed it was something I did wrong, but then I happened to notice the code stopped in a similar call on edison with a different problem.
Note issue: #1241
This happens before any time stepping.
Below is my attempt to summarize the issue in the code
components/cam/src/physics/cam/hetfrz_classnuc_cam.F90
. @gold2718 suggested I try to change the temp array calculation to only work over ncol. As I'm not that familiar with this code, is it OK to change just those calls to outfld() to work over ncol and leave the others as pcol? Or are we hiding another issue?case dir:
The text was updated successfully, but these errors were encountered: