implement the ability to automatically calculate PE bounds and tasking for component models on different platforms and resolutions #320

DeniseWorthen · 2020-12-07T16:03:19Z

Description

Currently when a new platform is added, additions need to be made to default vars to specify the tasking for all the components on the new platform. For the coupled model this requires hand-editing of the various tasking variables (e.g. PET bounds) for each component and resolution. This is prone to error.

Solution

Implement a system where the required variables could be automatically calculated and set. Each machine would have variables which define that platform (eg. TPN) and each component+resolution would have the required tasking defined. For example MOM6-mx025=120 Tasks, CICE6-mx100=12 Tasks etc.

Using this information for all components (ufsATM, CMEPS, MOM6, CICE6, WW3), the required PET bounds would be calculated automatically and used to set the required variables in nems.configure.

junwang-noaa · 2021-05-03T16:47:55Z

@binli2337 How many tasks does each node have on jet? Since the TPN_cpl_thrd in your PR #545 is 18, it seems jet has >36 tasks/node?

DeniseWorthen · 2021-10-07T12:07:12Z

@DusanJovic-NOAA I think you started to work on this issue a couple of months ago after a discussion on a morning tag-up. Did you ever get it working? During the P7 update, I had to once again implement the same PE/tasking changes for all the platforms and it reminded me of why I created this issue in the first place.

DusanJovic-NOAA · 2021-10-07T17:28:52Z

I tested this function (off-line) to compute PET bounds for each component. But I didn't have time to test it in rt.sh

#!/bin/bash
set -eu

function compute_petbounds () {

  local n=0

  # ATM
  if [[ $((ATM_compute_tasks + ATM_io_tasks)) -gt 0 ]]; then
     ATM_petlist_bounds="${n} $((n + ATM_compute_tasks + ATM_io_tasks -1))"
     n=$((n + ATM_compute_tasks + ATM_io_tasks))
  fi

  # CHM
  if [[ ${CHM_tasks:-0} -gt 0 ]]; then
     CHM_petlist_bounds="${n} $((n + CHM_tasks - 1))"
     n=$((n + CHM_tasks))
  fi

  # OCN
  if [[ ${OCN_tasks:-0} -gt 0 ]]; then
     OCN_petlist_bounds="${n} $((n + OCN_tasks - 1))"
     n=$((n + OCN_tasks))
  fi

  # ICE
  if [[ ${ICE_tasks:-0} -gt 0 ]]; then
     ICE_petlist_bounds="${n} $((n + ICE_tasks - 1))"
     n=$((n + ICE_tasks))
  fi

  # WAV
  if [[ ${WAV_tasks:-0} -gt 0 ]]; then
     WAV_petlist_bounds="${n} $((n + WAV_tasks - 1))"
     n=$((n + WAV_tasks))
  fi

  # MED
  MED_petlist_bounds="0 $((ATM_compute_tasks - 1))"

  UFS_tasks=${n}
}


# each test MUST define ${COMPONENT}_tasks variable for all components it is using
# and MUST NOT define those that it's not using or set the value to 0.

# ATM is a specaial case since it is ruuning on sum of compute and io tasks, and mediator is
# running only on compute tasks
ATM_compute_tasks=$((3 * 8 * 6))
ATM_io_tasks=$((1 * 6))
#CHM_tasks=0
OCN_tasks=30
ICE_tasks=12
WAV_tasks=208

compute_petbounds

echo "ATM_petlist_bounds: ${ATM_petlist_bounds:-}"
echo "OCN_petlist_bounds: ${OCN_petlist_bounds:-}"
echo "ICE_petlist_bounds: ${ICE_petlist_bounds:-}"
echo "WAV_petlist_bounds: ${WAV_petlist_bounds:-}"
echo "CHM_petlist_bounds: ${CHM_petlist_bounds:-}"
echo "MED_petlist_bounds: ${MED_petlist_bounds:-}"
echo "UFS_tasks         : ${UFS_tasks:-}"

DusanJovic-NOAA · 2021-10-07T21:40:14Z

This function assumes that mediator will run on the same tasks as ATM (compute tasks, not i/o), but this is not the case in all tests. For example in https://github.com/ufs-community/ufs-weather-model/blob/develop/tests/tests/hafs_regional_atm_ocn

export atm_petlist_bounds="0000 0299"
export ocn_petlist_bounds="0300 0359"
export med_petlist_bounds="0300 0359"

mediator runs on the same tasks ac ocean. So the above function will not fork for that case.

DeniseWorthen · 2021-10-07T21:49:12Z

You're right. That test doesn't actually use the default_vars to set the petlist. Maybe there would have to be a logical that would control whether we "auto-set" the petlist.

You were also talking about doing having a test where the atm didn't always get the first PEs which makes a function more complicated too.

I do see the trade-off between just brute force setting of the bounds like we do now vs doing something by a function.

DeniseWorthen · 2022-03-05T13:33:31Z

@arunchawla-NOAA @junwang-noaa Of the outstanding issues for UFS, I still believe addressing this one would be very productive.

Currently we are manually having to set values across multiple platforms and multiple tests. Each of these represent a potential failure point. I'm also not sure how or whether this might intersect w/ the implementation of ESMF threading?

DeniseWorthen · 2022-04-27T10:57:46Z

bump

DeniseWorthen added the enhancement New feature or request label Dec 7, 2020

This was referenced Apr 29, 2021

Add datm and coupled tests to Jet. #545

Closed

Combined PR: NEMS Driver cleanup (#533), Wave update (#542), add CPLD&DATM tests on Jet (#545) #533

Merged

DeniseWorthen mentioned this issue Nov 15, 2021

Update WW3 for fix for MPI reproducibility #911

Merged

16 tasks

DusanJovic-NOAA mentioned this issue May 2, 2022

Compute petlist bounds for each subcomponent from number of tasks. Update CICE #1200

Merged

16 tasks

DusanJovic-NOAA closed this as completed in #1200 May 10, 2022

DavidHuber-NOAA mentioned this issue May 18, 2022

Enable Intel 2022 and update test support on S4 #1223

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement the ability to automatically calculate PE bounds and tasking for component models on different platforms and resolutions #320

implement the ability to automatically calculate PE bounds and tasking for component models on different platforms and resolutions #320

DeniseWorthen commented Dec 7, 2020

junwang-noaa commented May 3, 2021 •

edited

Loading

DeniseWorthen commented Oct 7, 2021

DusanJovic-NOAA commented Oct 7, 2021

DusanJovic-NOAA commented Oct 7, 2021

DeniseWorthen commented Oct 7, 2021

DeniseWorthen commented Mar 5, 2022

DeniseWorthen commented Apr 27, 2022

implement the ability to automatically calculate PE bounds and tasking for component models on different platforms and resolutions #320

implement the ability to automatically calculate PE bounds and tasking for component models on different platforms and resolutions #320

Comments

DeniseWorthen commented Dec 7, 2020

Description

Solution

junwang-noaa commented May 3, 2021 • edited Loading

DeniseWorthen commented Oct 7, 2021

DusanJovic-NOAA commented Oct 7, 2021

DusanJovic-NOAA commented Oct 7, 2021

DeniseWorthen commented Oct 7, 2021

DeniseWorthen commented Mar 5, 2022

DeniseWorthen commented Apr 27, 2022

junwang-noaa commented May 3, 2021 •

edited

Loading