Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change # of processors in OCN when WW3 is active #146

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gustavo-marques
Copy link
Collaborator

Changes ntasks_ocn from 300 to 864.

Changes ntasks_ocn from 300 to 864.
@mnlevy1981
Copy link
Collaborator

I have a few questions for @alperaltuntas about this PE layout, that may lead to a few other updates...

  1. WW3 is getting 300 cores, but 300 isn't divisible by 36... so WW3 is using 8 full nodes and then 12 out of 36 cores on a 9th node. MOM6 is getting 864 cores, which would take exactly 24 nodes, but it is using the 24 remaining cores on the node WW3 isn't using entirely, 23 entire nodes, and then 12 more cores on a 25th node. Should WW3 be dropped down to 288 cores or bumped up to 324 cores to avoid unused cores in the reservation? Or even just setting ROOTPE_OCN=324 while keeping NTASKS_WAV=300 will let MOM6 have 24 nodes to itself instead of sharing one node with WW3
  2. On a related note, CICE is sharing tasks 0-107 with the coupler and data models. Is it okay to dump WW3 on those tasks as well, or would it make more sense to set ROOTPE_WAV=108 and then set ROOTPE_OCN to an appropriate value to stay off the WW3 nodes?

I think a handful of load balancing tests might lead to a more efficient layout.

@gustavo-marques when your run with 864 tasks for MOM6 finishes, can you post the timing summary?

@alperaltuntas
Copy link
Member

We have two main wave grids we are working with: ww3a (3-degree grid) and wt0.66v1 (0.66 degree grid). With the 2-degree grid, WW3 cost is insignificant, and we can probably set NTASKS_WAV to 108 in that case. As for the wt0.66v1 grid, below are my comments.

NTASKS_WAV=300 is dictated by the way parallelism works in WW3, which is based on decomposing the spectral domain (24 by 25 = 600). Changing NTASKS_WAV from 300 to 288 or 324 would probably slow down WW3 enough to justify 12 idle cores (though this should be confirmed.).

CICE is sharing tasks 0-107 with the coupler and data models. Is it okay to dump WW3 on those tasks as well, or would it make more sense to set ROOTPE_WAV=108 and then set ROOTPE_OCN to an appropriate value to stay off the WW3 nodes

I strongly agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants