-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split out the map/rank/bind into a separate man #557
Conversation
I'm working on a separate man page dedicated to the map/rank/bind functionality. This PR is definitely not ready, but I wanted to post it for the community to mark progress towards this goal for the v2 release. Anyone wanting to help feel free to reach out. |
00182a9
to
17f9c0f
Compare
|
|
Items that I removed from the
The discussion about when to use I noticed that
But this does not work
This
|
I noticed in this example (testing --np 6):
Before (from mpirun) the --map-by node line was as follows (round-robin):
But now it is (sequential):
I was running with:
Is this a change in behavior or a typo in the original? |
If the nodes are oversubscribed the binding report is empty. Should we print out something like "MCW rank 0 is unbound" or "MCW rank 0 bound to nothing"? |
The It says the following, but I'm seeing that "max_slots" is being ignored and the extra processes are on Limits to oversubscription can also be specified in the hostfile itself:
The
: causes the first 12 processes to be launched as before, but the |
The sequential mapper thew an unexpected error without the
|
Left to do:
I'm off next week (back June 1) - the community can feel free to push updates to my branch if they want to help with the pages. Otherwise I'll keep working on this when I get back. |
I think we are going to hit a lot of confusion if we aren't careful here. First, you cannot execute the cmds as you are showing them here: $ prte --hostfile hostfile.txt --prtemca rmaps seq /bin/true The So I have to assume that the errors you are reporting are from you actually running those cmds using something like |
I fixed the sequential mapper, and I also added a new |
Actually I was running For the PRRTE man pages I want them to reflect the PRRTE behavior without any personality. Then we can have separate sections for various personalities or something. |
I tested a number of variations of the --map-by, --bind-to and --rank-by options with prterun and found the following problems where it appears the documentation (prte-map.1.md) should be updated. The cluster I tested this with had 4 Power 8 nodes with 2 packages (sockets) per node, each with 10 cores and each core with 8 hwthreads (160 total hwthreads). The launch/local node was a Power 9 node The naming convention for hostfiles is hostfile where is the number of slots specified with the slots= keyword, and where the hostfile lists the 4 nodes in the cluster.
In all the above cases, if you want PRTE to default to the number I added the --use-hwthread-cpus option and got an error message stating that was an unknown option.
The help text told me that numa was one of the allowed choices. When I replaced slot with numa I got a different error message telling me that numa was invalid.
The same error occurs for **prun –host c712f6n01,c712f6n02 –np 8 ./a.out ** where the text says the additional 6 tasks should be allocated to these two nodes as well. This does not work, as expected with prterun unless an oversubscribe option is specified.
|
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
* Remove - `pmixam` since it wasn't being processed * Add - `gmca` - `gprtemca` Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
* Still lots to cleanup and verify here. Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
I'm not sure if this is a documentation problem or a code problem, but if I run the command prterun -n 24 --hostfile8 --bind-to slot taskinfo I get a message that the binding policy slot is not recognized. |
Correct - "slot" has no physical meaning, so we cannot bind you to it. |
Per #696 clarify that |
I don't think that sentence makes sense, nor do I think that is what is happening. The |
Replaced by PR #773 |
No description provided.