Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to specify order of assigment with prterun --map-by :pe-list option #696

Closed
drwootton opened this issue Dec 2, 2020 · 7 comments
Closed
Milestone

Comments

@drwootton
Copy link
Contributor

Thank you for taking the time to submit an issue!

Background information

What version of the PMIx Reference Server are you using? (e.g., v1.0, v2.1, git master @ hash, etc.)

Master branch

What version of PMIx are you using? (e.g., v1.2.5, v2.0.3, v2.1.0, git branch name and hash, etc.)

Master branch

Please describe the system on which you are running

  • Operating system/version:
    RHEL 7.7

  • Computer hardware:
    4 Power 8 nodes, 2 sockets each with 10 cores (20 total) and 8 hwthreads per core (160 total)

  • Network type:
    Hostfile specifies 4 nodes, each with slots=8 keyword


Details of the problem

If I run the command prterun -n 4 --hostfile hostfile8 --map-by core:PE-LIST=0,1,2,3 --bind-to core:report echo then I get output I expect

[c712f6n01:18908] MCW rank 0 bound to package[0][core:0]
[c712f6n01:18908] MCW rank 1 bound to package[0][core:1]
[c712f6n01:18908] MCW rank 2 bound to package[0][core:2]
[c712f6n01:18908] MCW rank 3 bound to package[0][core:3]

If I run the command prterun -n 4 --hostfile hostfile8 --map-by core:PE-LIST=3,2,1,0 --bind-to core:report echo I get the same output, where I expect something like

[c712f6n01:18908] MCW rank 0 bound to package[0][core:3]
[c712f6n01:18908] MCW rank 1 bound to package[0][core:2]
[c712f6n01:18908] MCW rank 2 bound to package[0][core:1]
[c712f6n01:18908] MCW rank 3 bound to package[0][core:0]

I think the ability to specify an arbitrary mapping is useful, such as when trying to place tasks for memory or cache affinity.

The problem is that when the set of resources specified in the PE-LIST option is handled as a set instead of a list, which means that it is impossible to retain any concept of ordering in the PE-LIST resources.

@rhc54
Copy link
Contributor

rhc54 commented Dec 2, 2020

it is impossible to retain any concept of ordering in the PE-LIST resources.

Correct - because as the help output indicates, there is no concept of ordering in this option. If someone wants to do a different pattern, such as you suggest, then they have other ways of doing it - e.g., rankfile mapping. Changing this option to support what you "expected" would be disruptive.

I think we need to step back a bit and ask ourselves what we are trying to accomplish here. Are we trying to check that things work as intended? In that case, we shouldn't file an issue for this case - it is performing as intended. Your "expectation" was simply at odds with the documented behavior. It might indicate we need to clarify the docs (meaning we should file a documentation issue), but not merit changing the behavior.

Trying to modify/expand the cmd line options to meet someone's expectations would be a rather intensive task - I would have to question the value and reality of it. We could easily wind up chasing our tails as one person's expectations are unlikely to match those of another.

Just to be clear: I'm not questioning the value of testing all these options. I appreciate someone doing so. I'm only trying to understand the motivation for filing issues on things that are working as documented/intended.

@drwootton
Copy link
Contributor Author

drwootton commented Dec 2, 2020

I started working on PRRTE with @jjhursey last summer and am in the process of learning the code. He asked me to do some testing with various combinations of map, bind, and rank options, which is the origin of the issues I have written. I'm not trying to push PRRTE in any specific direction, just trying to shake out bugs in the code.

I found cases which did not seem to be working right while I was testing. I discussed these with Josh and wrote issues only when he also thought the behavior might be wrong.

If what I think is a bug is something that PRRTE was clearly not intended to do, I'll accept that, or if it's a documentation cleanup problem I'll accept that too.

In this case, the only place I found any mention of the PE-LIST option was in the prterun --help output and in the draft documents in pull request #557. All that text says is that PE-LIST is a comma-delimited list of CPUs, it doesn't say anything about ordered or unordered lists. If there's other documentation that clarifies it, I don't know where that is.

If this is a documentation cleanup, that's fine. I realize changing PRRTE to pass around a list of CPUs with specific ordering is not a simple fix, but also don't know where it lands in terms of what's most important.

@rhc54
Copy link
Contributor

rhc54 commented Dec 2, 2020

I'm aware of the history - like I said, I'm not questioning the work. I'm just still trying to figure out how to respond to this "issue flood". It seems like these fall into a few categories:

  • things are working correctly, but then you gave it some incorrect input and didn't realize it was incorrect. Question here is why you didn't realize it was incorrect - did you read the doc and it either lacked clarity or is incorrect itself? Or did you not read the doc and blindly tried something? If the former, we can fix the doc. If the latter, then perhaps a better "show_help" message in case of the error might steer someone to the right doc.

  • you gave correct input but the system did the wrong thing - clearly a bug that should be addressed.

  • you gave correct input and the system behaved as intended, but not as you thought it should. The question here is: did the doc mislead you? In that case, we clearly should fix the doc. Or is the doc clear, but you thought it should do something different? This gets a little grayer - we've spent years dealing with conflicting ideas on what something should do. We can always review things, but it can become rather time consuming.

  • you gave incorrect input and the system either did something it shouldn't have done or simply barfed. Probably need better input checking in that case - either way, it is a bug.

Probably more categories in there, but hopefully this provides some food for thought. What I'm ultimately looking for is a way to label these issues so we can more easily deal with them. If we can come up with a reasonably simple way of categorizing them, then we can create a label for each category and appropriately mark the issues - and it is okay to put multiple labels on the issue if it somewhat spans categories, though it would be preferred if we can contain that.

@jjhursey
Copy link
Member

jjhursey commented Dec 3, 2020

Dave and I have been testing based on the documentation we have - in the man pages, FAQs, and help output. Where it is ambiguous we have tried to test what we think it should do. Our goal is to test existing functionality, not necessarily invent new functionality or interpretations. We file tickets for any mismatch with the hope that we can dialog with the community on if it is a documentation issue, bug, invalid usage, or possibly a future feature.

I think in the case of this Issue we hit on a documentation clarification issue, and possibly a future feature. I added a note to fixe the documentation issue in this comment. I think we should re-classify this ticket as a future, possible enhancement request for a PR-ORDERED-LIST option that preserves the order specified. I agree that would take a fair amount of work, and the priority of the request would need to be based on demand. Right now we don't have a demand for it from what I can tell.

@abouteiller
Copy link
Contributor

I was about to make my own issue about this, but I have also experienced 'surprising' mapping with combinations of --map-by --rank-by --bind-to options. In particular, I could not find a combination that replicates the good-old --bynode option from prior Open MPI versions.

For example, the following option set yielded a crash, that looks like a bug

salloc -N 4 -Ccauchy bin/mpiexec --rank-by node  --map-by :display --bind-to core     ~/ompi/benchmarks/imb/mpi-benchmarks/IMB-MPI1.u5 -npmin 32 Barrier
salloc: Granted job allocation 391603
salloc: Waiting for resource configuration
salloc: Nodes c[00-03] are ready for job

========================   JOB MAP   ========================
Data for JOB [3650,1] offset 0 Total slots allocated 32
    Mapping policy: BYPACKAGE:NOOVERSUBSCRIBE  Ranking policy: NODE Binding policy: CORE
    Cpu set: N/A  PPR: N/A  Cpus-per-rank: N/A  Cpu Type: CORE


    Data for node: c00  Num slots: 8    Max slots: 0    Num procs: 8
        Process jobid: [3650,1] App: 0 Process rank: 0 Bound: package[0][core:0]
        Process jobid: [3650,1] App: 0 Process rank: 4 Bound: package[1][core:4]
        Process jobid: [3650,1] App: 0 Process rank: 8 Bound: package[0][core:1]
        Process jobid: [3650,1] App: 0 Process rank: 12 Bound: package[1][core:5]
        Process jobid: [3650,1] App: 0 Process rank: 16 Bound: package[0][core:2]
        Process jobid: [3650,1] App: 0 Process rank: 20 Bound: package[1][core:6]
        Process jobid: [3650,1] App: 0 Process rank: 24 Bound: package[0][core:3]
        Process jobid: [3650,1] App: 0 Process rank: 28 Bound: package[1][core:7]

    Data for node: c01  Num slots: 8    Max slots: 0    Num procs: 8
        Process jobid: [3650,1] App: 0 Process rank: 1 Bound: package[0][core:0]
        Process jobid: [3650,1] App: 0 Process rank: 5 Bound: package[1][core:4]
        Process jobid: [3650,1] App: 0 Process rank: 9 Bound: package[0][core:1]
        Process jobid: [3650,1] App: 0 Process rank: 13 Bound: package[1][core:5]
        Process jobid: [3650,1] App: 0 Process rank: 17 Bound: package[0][core:2]
        Process jobid: [3650,1] App: 0 Process rank: 21 Bound: package[1][core:6]
        Process jobid: [3650,1] App: 0 Process rank: 25 Bound: package[0][core:3]
        Process jobid: [3650,1] App: 0 Process rank: 29 Bound: package[1][core:7]

    Data for node: c02  Num slots: 8    Max slots: 0    Num procs: 8
        Process jobid: [3650,1] App: 0 Process rank: 2 Bound: package[0][core:0]
        Process jobid: [3650,1] App: 0 Process rank: 6 Bound: package[1][core:4]
        Process jobid: [3650,1] App: 0 Process rank: 10 Bound: package[0][core:1]
        Process jobid: [3650,1] App: 0 Process rank: 14 Bound: package[1][core:5]
        Process jobid: [3650,1] App: 0 Process rank: 18 Bound: package[0][core:2]
        Process jobid: [3650,1] App: 0 Process rank: 22 Bound: package[1][core:6]
        Process jobid: [3650,1] App: 0 Process rank: 26 Bound: package[0][core:3]
        Process jobid: [3650,1] App: 0 Process rank: 30 Bound: package[1][core:7]

    Data for node: c03  Num slots: 8    Max slots: 0    Num procs: 8
        Process jobid: [3650,1] App: 0 Process rank: 3 Bound: package[0][core:0]
        Process jobid: [3650,1] App: 0 Process rank: 7 Bound: package[1][core:4]
        Process jobid: [3650,1] App: 0 Process rank: 11 Bound: package[0][core:1]
        Process jobid: [3650,1] App: 0 Process rank: 15 Bound: package[1][core:5]
        Process jobid: [3650,1] App: 0 Process rank: 19 Bound: package[0][core:2]
        Process jobid: [3650,1] App: 0 Process rank: 23 Bound: package[1][core:6]
        Process jobid: [3650,1] App: 0 Process rank: 27 Bound: package[0][core:3]
        Process jobid: [3650,1] App: 0 Process rank: 31 Bound: package[1][core:7]

=============================================================

[saturn:32036] PRTE ERROR: Fatal in file ../../../../../../master/3rd-party/prrte/src/mca/rmaps/base/rmaps_base_ranking.c at line 690
[saturn:32036] PRTE ERROR: Fatal in file ../../../../../../master/3rd-party/prrte/src/mca/odls/base/odls_base_default_fns.c at line 276
[saturn:32036] PRTE ERROR: Fatal in file ../../../../../../master/3rd-party/prrte/src/mca/plm/base/plm_base_launch_support.c at line 512
[saturn:32036] FORCE TERMINATE ORDERED AT ../../../../../../master/3rd-party/prrte/src/mca/plm/base/plm_base_launch_support.c:513 - error Unknown error(1)

But other combinations would do unexpected things silently (like ranking by package when requesting rank-by node, and other bizarre things).

@rhc54
Copy link
Contributor

rhc54 commented Feb 19, 2021

I could not find a combination that replicates the good-old --bynode option from prior Open MPI versions.

I'm going to look at the crash, but the equivalent of bynode is simply --map-by node

@jjhursey
Copy link
Member

This issue is an enhancement request to add a PR-ORDERED-LIST option to --map-by similar to the PR-LIST option, but honors the ordering of the options. Note below from earlier comment:

I think we should re-classify this ticket as a future, possible enhancement request for a PR-ORDERED-LIST option that preserves the order specified. I agree that would take a fair amount of work, and the priority of the request would need to be based on demand. Right now we don't have a demand for it from what I can tell.

@jjhursey jjhursey added this to the Future milestone Mar 25, 2021
@rhc54 rhc54 closed this as completed May 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants