Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update pelayout on pm-cpu for ne4 scream tests #6676

Merged

Conversation

ndkeen
Copy link
Contributor

@ndkeen ndkeen commented Oct 10, 2024

For ne4 cases, use only 96 tasks as scream requires no more MPI's than number of elements.
SMS.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.pm-cpu_intel

Unrelated: Rename a machinefile to reflect machine name for gcp12 builds with scream.
For this change it fixes E3SM-Project/scream#3036 (at least the build issue)

[bfb]

Rename a machinefile to reflect machine name
@ndkeen ndkeen added Machine Files EAMxx PRs focused on capabilities for EAMxx GCP google cloud platform pm-cpu Perlmutter at NERSC (CPU-only nodes) labels Oct 10, 2024
@ndkeen ndkeen requested a review from mahf708 October 10, 2024 22:36
Copy link
Contributor

@mahf708 mahf708 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Copy link

PR Preview Action v1.4.8
🚀 Deployed preview to https://E3SM-Project.github.io/E3SM/pr-preview/pr-6676/
on branch gh-pages at 2024-10-10 22:37 UTC

@ndkeen ndkeen added the BFB PR leaves answers BFB label Oct 10, 2024
@rljacob
Copy link
Member

rljacob commented Oct 11, 2024

Why would EAMxx complain about this and not EAM?

@ndkeen
Copy link
Contributor Author

ndkeen commented Oct 11, 2024

I have asked that as well. Must be different paths, but both repos have same:

homme/src/share/prim_driver_base.F90

    ! we want to exit elegantly when we are using too many processors.                                                                     
    if (nelem < par%nprocs) then
       call abortmp('Error: too many MPI tasks. set dyn_npes <= nelem')
    end if

It was so elegant. The most elegant.

ndkeen added a commit that referenced this pull request Oct 14, 2024
…ts' into next (PR #6676)

For ne4 cases, use only 96 tasks as scream requires no more MPI's than number of elements.
SMS.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.pm-cpu_intel

Unrelated: Rename a machinefile to reflect machine name for gcp12 builds with scream.
For this change it fixes E3SM-Project/scream#3036 (at least the build issue)

[bfb]
@rljacob
Copy link
Member

rljacob commented Oct 15, 2024

This changed the layout for several e3sm_integration tests:
ERP_Ln9.ne4pg2_oQU480.WCYCL20TRNS-MMF1.pm-cpu_intel.allactive-mmf_fixed_subcycle
ERS.ne4pg2_oQU480.WCYCL1850NS.pm-cpu_intel
ERS_Vmoab.ne4pg2_oQU480.WCYCL1850NS.pm-cpu_intel
NCK.ne4pg2_oQU480.WCYCL1850NS.pm-cpu_intel

If you want to keep this, the Vmoab test diffs needs to be blessed and those all have namelist diffs to bless.

@ndkeen
Copy link
Contributor Author

ndkeen commented Oct 15, 2024

Makes sense. I could try to make another entry in pelayout XML to match ne4 with SCREAM, but it might be best to just run all ne4 cases with 96 MPI's on pm-cpu. What do we think? Before this change, ne4 tests would have used a full node, which is 128 MPI's on pm-cpu.

@rljacob
Copy link
Member

rljacob commented Oct 15, 2024

Its fine just making the change for all cases.

@ndkeen ndkeen merged commit a1c3cb0 into master Oct 15, 2024
9 checks passed
@ndkeen ndkeen deleted the ndk/machinefiles/pm-cpu-pelayout-updates-for-scream-tests branch October 15, 2024 23:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BFB PR leaves answers BFB EAMxx PRs focused on capabilities for EAMxx GCP google cloud platform Machine Files pm-cpu Perlmutter at NERSC (CPU-only nodes)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Build error with new test SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.gcp12_gnu on gcp12
3 participants