-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LSF provided affinity is not supported #791
Comments
You should just need to use HWLOC to convert the physical IDs to their logical equivalents. You might look at an old ORTE/OPAL code as that is what we used to do. |
This turns out to be trivial: obj = hwloc_get_pu_obj_by_os_index(topo, physical_id);
logical_id = obj->logical_index; Checked and that works all the way back to HWLOC 1.11, so it should be okay to use. |
I think that'll work fine in a homogeneous configuration. If we detect a heterogeneous configuration then we might have issues if we do the translation on the node with the HNP. In the short-term, that's an ok restriction. In the longer-term, we may want to handle this on the backend, but that would require re-introducing physical IDs more broadly which I don't know if we want to do. I'll see if I can get to the short-term fix next week. |
Fair point. I'd still do the translation on the HNP for simplicity, but you could do it in the plm/base where we receive the hetero topology from the remote node. You'd have to do it that way in the case (which I believe is common for LSF) where the HNP is on a login node and the compute node (due to cgroup or whatever) is different, even if the physical architecture is the same. |
I'm working on this now, and think I have a fix in progress. |
LSF allows the user to specify process affinity at
bsub
time similar to:This results in a non-empty file pointed to by
$LSB_AFFINITY_HOSTFILE
. This file will list the hardware threads that the process should be bound to using physical IDs. The hardware threads is already addressed by setting thePRTE_JOB_HWT_CPUS
attribute. However, the physical hardware thread IDs is the problem as PRTE no longer supports physical IDs.In PR #597 we now throw an error when we detect this scenario. We need to work on a solution to restore this functionality.
The text was updated successfully, but these errors were encountered: