Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

btl/uct using wrong topology #7693

Open
rhc54 opened this issue May 4, 2020 · 1 comment
Open

btl/uct using wrong topology #7693

rhc54 opened this issue May 4, 2020 · 1 comment

Comments

@rhc54
Copy link
Contributor

rhc54 commented May 4, 2020

In mca_btl_uct_component_open, a decision on the number of contexts to use is based on getting the number of cores on the node. However, HWLOC will return that value based on the number of cores available to the PRRTE daemon - which is not the same as the number of cores available to the application. For example, the user may have specified a cpuset (a.k.a., "soft" cgroup) that only applies to the application's procs.

Correct computation of the number of available cores requires that you get the PMIX_JOB_CPUSET, mask the complete_cpuset against it to find the actually available cpus, and then obtain the count of cores from within that cpuset.

@gpaulsen
Copy link
Member

Removing blocker label as btl/uct is low priority.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants