-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to make prterun use round-robin mapping behavior #674
Comments
There's at least two use cases I can think of where the layout of tasks matters
I seems to me anything more complicated than that is a matter for using :pe_list or rankfiles |
I'm not sure what you mean by "round-robin" - could you explain what result you were trying to achieve? It looks to me like it is indeed performing a "round-robin" mapping, so I suspect our definitions of that term are different. |
For round-robin, if I have 8 tasks to run on 4 nodes, then I expect the following node/task mapping What I am seeing is tasks mapped sequentially on a node until the node is fully allocated then the next node is used |
Also, if I am mapping or binding by something smaller than node, like core, I am still expecting tasks to be mapped similarly, where there might be only one core used on each node |
That would be round-robin by node, not package. Round-robin by package would result in mapping tasks evenly across the two packages on node 0 until that node was full, and then moving on to the next node - which looks like exactly what PRRTE is doing. Round-robin by package:span would map tasks evenly across all packages across all nodes - which again looks like what it did. |
If I specify --map-by node and bind-to core then I might be getting the round-robin behavior I was expecting because with that I see all 4 nodes allocated with 6 tasks per node where tasks 0-5 are using cores 0-5 on node 1, 6-11 using cores 0-5 on node 2, etc. If mapping and binding are working the way that was intended that's fine. I just wanted to be sure nothing was broken. @jjhursey asked me to do some testing of mapping, binding and ranking. In reading the documentation I understood that default behavior was to map round-robin and if --map-by options were specified then the allocations would change and that wasn't clear to me that I was getting correct round robin behavior. Maybe I'm also too worried about where specific task ranks are assigned, and maybe I'm being further confused by the ranking step. |
Round-robin by package is indeed the default - however, it is not package:span. From what you show, it is working as expected. |
Double-check my understanding here. If we have the
Let And the rule:
Or more generally:
Example 1:
Will result in 8 processes on the first 3 nodes (since
Example 2:
If you changed the hostfile to have
Example 3:For
Would result in 6 processes per node across all 4 nodes. Slots is only used to determine if we oversubscribe.
Example 4:
This should result in 8 processes across 3 nodes.
Example 5:
This should result in 8 processes across 4 nodes.
@drwootton In your original report, the CommentaryNote 1We need to document that if Note 2For
A Note 3The misunderstanding that I had with this was that with the definition of round-robin below (which is fine) that the round-robin doesn't hop to the next node when it hits the hardware limit for the OBJ, but instead when it hits the slot limit.
So if I called If we hopped to the next node when we hit the number of packages (2 in this case) then we would iterate in the mapping four times with the last iteration placing 2 processes on both node1 and node2.
Rather what is happening is that we ignore the package hardware limit and use the slots listing. Thus filling the first three nodes and placing the extra 4 processes on the last node.
Is my understanding here correct? If so then I'll write up this example (in a better form) for the documentation. |
I'm afraid that is not correct. The divisor is the total number of OBJs of that type across all allocated nodes. In this case, you have two packages on each of 4 nodes, so that means the divisor is 8, and the mapper will assign 3 procs to each package of every node.
As stated in the other issue, it does
No - as stated above, SPAN applies to the object type, not the nodes. What actually happens here is a little more complicated. First, we check the number of requested procs against the total number of available slots (summed across all allocated nodes) to see if we can even run the job. If there aren't enough slots, then we check if oversubscribe is allowed. If not, we immediately error out. If it is, then we continue. Next, we compute the average number of procs/object by dividing the number of procs by the total number of objects across all allocated nodes. We then begin assigning that number to each object, constrained by the number of slots on the node. At the end of that pass, we see that we have leftover procs. Given that oversubscribed was allowed (or else we would have errored out right away), we go back and add one proc at a time in a round-robin fashion across the objects until we have all the procs.
Yes - because you didn't say :SPAN, we fill each node before moving on to the next. So you will indeed get the layout in your last figure. |
Ok. That makes sense to me then. I've flagged this as a documentation item. I'll try to summarize this in the documentation around |
Background information
What version of the PMIx Reference Server are you using? (e.g., v1.0, v2.1, git master @ hash, etc.)
Master branch
What version of PMIx are you using? (e.g., v1.2.5, v2.0.3, v2.1.0, git branch name and hash, etc.)
Master branch
Please describe the system on which you are running
Operating system/version:
RHEL 7.7
Computer hardware:
4 Power 8 nodes, 2 sockets each with 10 cores (20 total) and 8 hwthreads per core (160 total)
Hostfile specifies 4 nodes, each with slots=8 keyword
Details of the problem
I tried to make prterun use round-robin mapping and have been unable to make that work.
I started with the command prterun -n 24 --hostfile hostfile8 --map-by package --bind-to package:REPORT where hostfile8 lists 4 nodes with 8 slots each.
The binding report showed that the 24 tasks were allocated as 8 tasks on each of the first 3 nodes and the 4th node left empty
Then I changed the --map-by option to --map-by package:SPAN and the allocation changed to 6 tasks on each of the 4 nodes, so SPAN did balance the allocation, but tasks are still not allocated round robin.
In the current documentation #557 prte-mp.1.md there's a statement that by default tasks are scheduled round-robin. I thought that was explaining default mapping behavior so I tried prterun -n 32 --hostfile hostfile8 --bind-to package:REPORT taskinfo and got the same bind report as the first,
The text was updated successfully, but these errors were encountered: