Skip to content

btl/uct using wrong topology #7693

Open
@rhc54

Description

@rhc54

In mca_btl_uct_component_open, a decision on the number of contexts to use is based on getting the number of cores on the node. However, HWLOC will return that value based on the number of cores available to the PRRTE daemon - which is not the same as the number of cores available to the application. For example, the user may have specified a cpuset (a.k.a., "soft" cgroup) that only applies to the application's procs.

Correct computation of the number of available cores requires that you get the PMIX_JOB_CPUSET, mask the complete_cpuset against it to find the actually available cpus, and then obtain the count of cores from within that cpuset.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions