Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix slurm handling of allocatable cores #181

Merged

Conversation

xylar
Copy link
Collaborator

@xylar xylar commented Jan 31, 2024

Some systems like Frontier have cores that aren't allocatable. These need to be excluded from the core count that Polaris determines from slurm.

Checklist

  • Testing comment in the PR documents testing used to verify the changes

Some systems like Frontier have cores that aren't allocatable.
These need to be excluded from the core count that Polaris
determines from slurm.
@xylar xylar added bug Something isn't working framework Changes relating to the polaris framework as opposed to individual tests or analysis labels Jan 31, 2024
@xylar xylar self-assigned this Jan 31, 2024
@xylar xylar mentioned this pull request Jan 31, 2024
1 task
@xylar
Copy link
Collaborator Author

xylar commented Jan 31, 2024

Testing

With this fix, Polaris tasks that use more than one Frontier node run successfully, whereas they fail because they try to run on 64 cores per node (the total, rather than the allocatable number) without this fix.

@xylar
Copy link
Collaborator Author

xylar commented Jan 31, 2024

This needs to be tested on Chrysalis and Perlmutter to make sure it doesn't break anything there.

@xylar xylar merged commit e3b4dd8 into E3SM-Project:update-to-0.3.0-alpha.1 Jan 31, 2024
@xylar xylar deleted the fix-slurm-allocatable-cores branch January 31, 2024 21:38
@xylar
Copy link
Collaborator Author

xylar commented Feb 1, 2024

This approach didn't work on Compy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working framework Changes relating to the polaris framework as opposed to individual tests or analysis
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant