-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement the ability to automatically calculate PE bounds and tasking for component models on different platforms and resolutions #320
Comments
@binli2337 How many tasks does each node have on jet? Since the TPN_cpl_thrd in your PR #545 is 18, it seems jet has >36 tasks/node? |
@DusanJovic-NOAA I think you started to work on this issue a couple of months ago after a discussion on a morning tag-up. Did you ever get it working? During the P7 update, I had to once again implement the same PE/tasking changes for all the platforms and it reminded me of why I created this issue in the first place. |
I tested this function (off-line) to compute PET bounds for each component. But I didn't have time to test it in rt.sh #!/bin/bash
set -eu
function compute_petbounds () {
local n=0
# ATM
if [[ $((ATM_compute_tasks + ATM_io_tasks)) -gt 0 ]]; then
ATM_petlist_bounds="${n} $((n + ATM_compute_tasks + ATM_io_tasks -1))"
n=$((n + ATM_compute_tasks + ATM_io_tasks))
fi
# CHM
if [[ ${CHM_tasks:-0} -gt 0 ]]; then
CHM_petlist_bounds="${n} $((n + CHM_tasks - 1))"
n=$((n + CHM_tasks))
fi
# OCN
if [[ ${OCN_tasks:-0} -gt 0 ]]; then
OCN_petlist_bounds="${n} $((n + OCN_tasks - 1))"
n=$((n + OCN_tasks))
fi
# ICE
if [[ ${ICE_tasks:-0} -gt 0 ]]; then
ICE_petlist_bounds="${n} $((n + ICE_tasks - 1))"
n=$((n + ICE_tasks))
fi
# WAV
if [[ ${WAV_tasks:-0} -gt 0 ]]; then
WAV_petlist_bounds="${n} $((n + WAV_tasks - 1))"
n=$((n + WAV_tasks))
fi
# MED
MED_petlist_bounds="0 $((ATM_compute_tasks - 1))"
UFS_tasks=${n}
}
# each test MUST define ${COMPONENT}_tasks variable for all components it is using
# and MUST NOT define those that it's not using or set the value to 0.
# ATM is a specaial case since it is ruuning on sum of compute and io tasks, and mediator is
# running only on compute tasks
ATM_compute_tasks=$((3 * 8 * 6))
ATM_io_tasks=$((1 * 6))
#CHM_tasks=0
OCN_tasks=30
ICE_tasks=12
WAV_tasks=208
compute_petbounds
echo "ATM_petlist_bounds: ${ATM_petlist_bounds:-}"
echo "OCN_petlist_bounds: ${OCN_petlist_bounds:-}"
echo "ICE_petlist_bounds: ${ICE_petlist_bounds:-}"
echo "WAV_petlist_bounds: ${WAV_petlist_bounds:-}"
echo "CHM_petlist_bounds: ${CHM_petlist_bounds:-}"
echo "MED_petlist_bounds: ${MED_petlist_bounds:-}"
echo "UFS_tasks : ${UFS_tasks:-}" |
This function assumes that mediator will run on the same tasks as ATM (compute tasks, not i/o), but this is not the case in all tests. For example in https://github.com/ufs-community/ufs-weather-model/blob/develop/tests/tests/hafs_regional_atm_ocn
mediator runs on the same tasks ac ocean. So the above function will not fork for that case. |
You're right. That test doesn't actually use the default_vars to set the petlist. Maybe there would have to be a logical that would control whether we "auto-set" the petlist. You were also talking about doing having a test where the atm didn't always get the first PEs which makes a function more complicated too. I do see the trade-off between just brute force setting of the bounds like we do now vs doing something by a function. |
@arunchawla-NOAA @junwang-noaa Of the outstanding issues for UFS, I still believe addressing this one would be very productive. Currently we are manually having to set values across multiple platforms and multiple tests. Each of these represent a potential failure point. I'm also not sure how or whether this might intersect w/ the implementation of ESMF threading? |
bump |
Description
Currently when a new platform is added, additions need to be made to default vars to specify the tasking for all the components on the new platform. For the coupled model this requires hand-editing of the various tasking variables (e.g. PET bounds) for each component and resolution. This is prone to error.
Solution
Implement a system where the required variables could be automatically calculated and set. Each machine would have variables which define that platform (eg. TPN) and each component+resolution would have the required tasking defined. For example MOM6-mx025=120 Tasks, CICE6-mx100=12 Tasks etc.
Using this information for all components (ufsATM, CMEPS, MOM6, CICE6, WW3), the required PET bounds would be calculated automatically and used to set the required variables in nems.configure.
The text was updated successfully, but these errors were encountered: