-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the automated max_blocks calculation #954
Conversation
…uired on each MPI task and then sets that as max_blocks is max_blocks=-1 in namelist. This is done in ice_distribution and is a function of the decomposition among other things. Refactor the decomposition computation to defer usage of max_blocks and eliminate the blockIndex array. Update some indentation formatting in ice_distribution.F90. Modify cice.setup and cice_decomp.csh to set max_blocks=-1 unless it's explicitly defined by the cice.setup -p setting. Fix a bug in ice_gather_scatter related to zero-ing out of the halo with the field_loc_noupdate setting. This was zero-ing out the blocks extra times and there were no problems as long as max_blocks was the same value on all MPI tasks. With the new implementation of max_blocks=-1, max_blocks can be different values on different MPI tasks. An error was generated and then the implementation was fixed so each block on each task is now zeroed out exactly once. Update diagnostics related to block numbers. Write out the min and max max_blocks values across MPI tasks. Add extra allocation/deallocation checks in ice_distribution.F90 and add a function, ice_memusage_allocErr, to ice_memusage.F90 that checks the alloc/dealloc return code, writes an error message, and aborts. This function could be used in other parts of the code as well. Fix a bug in the io_binary restart output where each task was writing some output when it should have just been the master task. Add sectcart test case.
I still testing, refining the test suite, and updating documentation. But this should represent the code changes I'm proposing. Things are running well. The max_blocks=-1 setting now computes the maximum required blocks on each task and sets the internal max_blocks variable to that value. That means that it uses exactly the amount of memory required and the max_blocks can vary per task. Users can still manually set max_blocks in namelist as before. |
Testing results look good. https://github.com/CICE-Consortium/Test-Results/wiki/cice_by_hash_forks#7402dc7f04f98d840890f29f8f02a59f956a8fc2. This is ready for review and merge. |
Could someone do a review on this PR? Would love to get this merged. Then I can start comprehensively testing in preparation for a release. Thanks! |
There is a lot here, so I might have missed something. I'm not going to get a chance to test this out until later (after the workshop). I will approve, but just know I might find stuff later once I have tested. |
@anton-seaice do you have time to look at this? It's probably after hours there... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good Tony.
It looks like there is coverage in the tests for both setting max_blocks automatically and from the namelist, and the tests are passing.
It looks like the test cases might be using slightly less memory when max_blocks is set automatically, but the resolution is probably too low to be significant.
@@ -227,8 +228,7 @@ but the user can overwrite the defaults by manually changing the values in | |||
information to the log file, and if the block size or max blocks is | |||
inconsistent with the task and thread size, the model will abort. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inconsistent with the task and thread size, the model will abort. The | |
inconsistent with the task and thread size, the model will abort. If ``max_blocks``=-1, the model will calculate the number of blocks needed for each task. ``max_blocks`` can also be set by the user, although this may use extra memory. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the documentation here.
and chooses a block size, ``block_size_x`` :math:`\times`\ ``block_size_y``, | ||
and decomposition information ``distribution_type``, ``processor_shape``, | ||
and ``distribution_type`` in **ice_in**. ``max_blocks`` is computed | ||
automatically if set to a value of -1, but it can also be set by the user. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
automatically if set to a value of -1, but it can also be set by the user. |
I think this sentence should be at the end of the paragraph. If I understand correctly, max_blocks
does not impact how the blocks are distributed, the how depends on all the other parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think this paragraph is out of order:
The user sets the NTASKS and NTHRDS settings in cice.settings and chooses a block size, block_size_x block_size_y, and decomposition information distribution_type, processor_shape, and distribution_wgt in ice_in. If max_blocks=-1, the model will calculate the number of blocks needed for each task. max_blocks can also be set by the user, although this may use extra memory and the model will abort if max_blocks is set too small for the decomposition. This information is used to determine how the blocks are distributed across the processors, and how the processors are distributed across the grid domain.
"This information" should refer to the information in the first sentence but at the moment it reads like "This information" refers to "max_blocks"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, thanks for checking again. I have refactored that paragraph a bit. I think it's better now. Let me know if you have any concerns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thankyou
character(*),parameter :: subname = '(ice_memusage_allocErr)' | ||
|
||
ice_memusage_allocErr = .false. | ||
if (istat > 0) then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the value of istat have a meaning? Should we return istat or interpret the error in some way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a look at some documentation. I am changing this to istat /= 0 to be more correct. Couldn't find any info about return codes other than 0 is success.
!=============================================================================== | ||
! check memory alloc/dealloc return code | ||
|
||
logical function ice_memusage_allocErr(istat, errstr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this ... unhandled errors spook me :)
! set/check max_blocks | ||
if (lmax_blocks_calc) then | ||
if (max_blocks < 0) then | ||
max_blocks = newDistrb%numLocalBlocks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is calling this variable max_blocks
confusing? Isn't it just the number of blocks for this task ? Where max_blocks means the maximum number for all the tasks ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_blocks is used throughout the code as the number of blocks to allocate on each task. It used to be the same value on all tasks, set by the user. Now it's computed internally and set uniquely for each task. Setting max_blocks in line 683 is doing exactly that. Because the rake decomp calls the cartesian decomp, I needed to add an option where I could turn off the setting of max_blocks in cart because I calculate it in rake. But if cart is called directly, then max_blocks is set. Every decomp option sets the max_blocks variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused here. I thought nblocks was meant to be the number of blocks on each task and this could be different. The parameter max_blocks should be the maximum number of blocks across all tasks, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dabail10, you are correct. In general, nblocks is the number of active blocks on each task. max_blocks is the number of blocks that are allocated on each task. Historically, max_blocks was used to allocate memory, was the same value on all tasks, and was set by a CPP at build time. While nblocks was used to loop over only active blocks at run time. The code still uses those two variables as they've always been defined.
But in the last few years, we moved to dynamic allocation of memory (moving max_blocks to namelist) and with the current PR, we are able to compute max_blocks internally on each task BEFORE we need to use it to allocate memory. So in that case, nblocks and max_blocks can overlap in function.
However, we still support user defined max_blocks (although we probably don't need to) which means we still want to differentiate max_blocks and nblocks.
Maybe the next step is to ignore the max_blocks namelist setting and always set it internally. Once we do that, we could, in theory, unify nblocks and max_blocks in the code to a single variable. But we're not quite there yet, and I think we could debate whether all that refactoring would be worth the effort. max_blocks and nblocks still play different roles (memory allocation vs active block count), and we're all pretty comfortable with that scheme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a "max_blocks" in the namelist and a "max_blocks" in the code with different meanings is confusing. I see that it may not be worth it to unify max_blocks and nblocks (there are ~1000 uses of max_blocks), so I guess we just run with it as is.
This comment in ice_domain_size
is now wrong:
max_blocks , & ! max number of blocks per processor
and maybe could be number of blocks memory is allocated for or similar?
cicecore/shared/ice_distribution.F90
Outdated
call abort_ice(subname//'ERROR: processors left with no blocks') | ||
newDistrb%numLocalBlocks = newDistrb%blockCnt(my_task+1) | ||
if (newDistrb%numLocalBlocks < 0) then | ||
call abort_ice(subname//'ERROR: processors left with no blocks', & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we abort here, but not say if there are no processors left with no blocks for a cartesian distribution ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left the implementation as it was. Looks like rake is the only one that checks that there has to be a block on each task. I think I'll remove that constraint now. There is already a test that verifies the model runs fine with zero blocks on a task, so we've got that covered in the test suite. Good catch.
cicecore/shared/ice_distribution.F90
Outdated
call abort_ice(subname//'ERROR: processors left with no blocks') | ||
newDistrb%numLocalBlocks = newDistrb%blockCnt(my_task+1) | ||
if (newDistrb%numLocalBlocks < 0) then | ||
call abort_ice(subname//'ERROR: processors left with no blocks', & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call abort_ice(subname//'ERROR: processors left with no blocks', & | |
call abort_ice(subname//'ERROR: tasks left with no blocks', & |
I am not totally on top of threading, but should this be tasks ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code has been removed.
enddo | ||
! set/check max_blocks | ||
if (max_blocks < 0) then | ||
max_blocks = newDistrb%numLocalBlocks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its kind of silly for roundobin, but should we check the processor has work (i.e. numLocalBlocks /= 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having no work on a task is allowed. In my opinion, it's up to the user to properly tune the number of processors and decomposition.
I have update the PR based on feedback from @anton-seaice and am running a set of tests just to make sure nothing is broken. Will report results when the testing is done. Thanks @anton-seaice for the comments. |
I reran a portion of the test suite with the latest code changes and I think everything is OK. I'll merge once github actions passes and @anton-seaice is happy with the current implementation. Please let me know if anything else needs to be fixed. Thanks! |
There just a couple if lines in
Could be updated
Can probably be removed because its covered in the docs ? |
good catch, fixed these. |
I think you still need to push the commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Tony!
PR checklist
Short (1 sentence) summary of your PR:
Update the automated max_blocks calculation
Developer(s):
apcraig, anton
Suggest PR reviewers from list in the column to the right.
Please copy the PR test results link or provide a summary of testing completed below.
Testing still underway. Expect full test suite on derecho with intel, gnu, cray to be bit-for-bit.
How much do the PR code changes differ from the unmodified code?
Does this PR create or have dependencies on Icepack or any other models?
Does this PR update the Icepack submodule? If so, the Icepack submodule must point to a hash on Icepack's main branch.
Does this PR add any new test cases?
Is the documentation being updated? ("Documentation" includes information on the wiki or in the .rst files from doc/source/, which are used to create the online technical docs at https://readthedocs.org/projects/cice-consortium-cice/. A test build of the technical docs will be performed as part of the PR testing.)
Please document the changes in detail, including why the changes are made. This will become part of the PR commit log.
Update support for max_blocks=-1. This update computes the blocks required on
each MPI task and then sets that as max_blocks if max_blocks=-1 in namelist.
This is done in ice_distribution and is a function of the decomposition among
other things. Refactor the decomposition computation to defer usage of max_blocks
and eliminate the blockIndex array. Update some indentation formatting in
ice_distribution.F90.
Modify cice.setup and cice_decomp.csh to set max_blocks=-1 unless it's explicitly
defined by the cice.setup -p setting.
Fix a bug in ice_gather_scatter related to zero-ing out of the halo with the
field_loc_noupdate setting. This was zero-ing out the blocks extra times and
there were no problems as long as max_blocks was the same value on all MPI tasks.
With the new implementation of max_blocks=-1, max_blocks can be different values
on different MPI tasks. An error was generated and then the implementation
was fixed so each block on each task is now zeroed out exactly once.
Update diagnostics related to max_block information. Write out the min and max
max_blocks values across MPI tasks.
Add extra allocation/deallocation checks in ice_distribution.F90 and add
a function, ice_memusage_allocErr, to ice_memusage.F90 that checks the
alloc/dealloc return code, writes an error message, and aborts. This
function could be used in other parts of the code as well.
Fix a bug in the io_binary restart output where each task was writing some
output when it should have just been the master task.
Update test cases
Update documentation