Skip to content

Conversation

@rhc54
Copy link
Contributor

@rhc54 rhc54 commented Feb 10, 2016

Per the discussion on the telecon, change the -host behavior so we only run one instance if no slots were provided and the user didn't specify #procs to run. However, if no slots are given and the user does specify #procs, then let the number of slots default to the #found processing elements

Ensure the returned exit status is non-zero if we fail to map

If no -np is given, but either -host and/or -hostfile was given, then error out with a message telling the user that this combination is not supported.

If -np is given, and -host is given with only one instance of each host, then default the #slots to the detected #pe's and enforce oversubscription rules.

If -np is given, and -host is given with more than one instance of a given host, then set the #slots for that host to the number of times it was given and enforce oversubscription rules. Alternatively, the #slots can be specified via "-host foo:N". I therefore believe that row #7 on Jeff's spreadsheet is incorrect.

With that one correction, this now passes all the given use-cases on that spreadsheet.

Make things behave under unmanaged allocations more like their managed cousins - if the #slots is given, then no-np shall fill things up.

Fixes #1344

@jsquyres
Copy link
Member

@rhc54 I'm not sure the right things are happening.

I have 2 hosts, mpi009,mpi010 -- with 16 and 24 cores, respectively.

  1. GOOD If I mpirun --host mpi009 hostname, I get a single hostname output.
  2. GOOD If I mpirun --host mpi009 -np 10 hostname, I get 10 hostname outputs, and no complaints about oversubscription.
  3. BAD If I mpirun --host mpi009 -np 50 hostname, I get 50 hostname outputs, and no complaints about oversubscription.
    • Shouldn't this form set the num_slots value to be the num_cores hwloc detects on mpi009?
  4. ??? If I mpirun --host mpi009,mpi010 hostname, I get 40 hostname outputs.
    • I guess we didn't talk about what happens if you specify multiple hosts to --host...?
    • I think I was under the impression that if --host foo,bar,baz was specified (with no -np or --map-by args), we would get one process each on foo, bar, and baz...?
    • FWIW, I think I see other users making the same assumption about 1ppn (Cisco QA just turned in a bug to me for this very case, actually). But I don't know if it's a universal assumption, nor do I remember offhand what we did earlier in v1.10.x.
    • FWIW 2: This would be different behavior than a hostfile, though (e.g., with the same mpi009, mpi010 hosts in it). I just confirmed with a hostfile.txtx containing mpi009 and mpi010 (no slots specifications):
      • mpirun --hostfile hostfile.txt hostname runs 40 copies of hostname.
      • mpirun --hostfile hostfile.txt -np 2 hostname runs 2 copies of hostname on the first host (i.e., the default mapping is by-core)

@jsquyres
Copy link
Member

The more I think about the ??? case from my prior comment, the more I think it should be GOOD. Specifically: it makes the --host behavior be like --hostfile behavior, and I think that's a Good Thing(tm).

That being said, it does make it weird that --host foo and --host foo,bar behavior is different (the former will launch 1 process, the latter will launch 2*num_cores processes). Is this what we intend?

@rhc54
Copy link
Contributor Author

rhc54 commented Feb 11, 2016

@jsquyres and I discussed this on the phone and agreed that fixing the oversubscribed issue is the only change that is required.

@rhc54 rhc54 added bug and removed pushed-back labels Feb 12, 2016
@rhc54
Copy link
Contributor Author

rhc54 commented Feb 12, 2016

@jsquyres Okay, I believe this now fits the agreed-upon behavior. Please verify and commit

@jsquyres
Copy link
Member

Hmm. It's mostly what I expected:

$ mpirun -np --host mpi009 hostname
mpi009

# This is a 16 core machine
$ mpirun -np 16 --host mpi009 hostname
mpi009
mpi009
mpi009
mpi009
mpi009
mpi009
mpi009
mpi009
mpi009
mpi009
mpi009
mpi009
mpi009
mpi009
mpi009
mpi009

# There are 16 cores, so this should error
$ mpirun -np 17 --host mpi009 hostname
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 17 slots
that were requested by the application:
  hostname

Either request fewer slots for your application, or make more slots available
for use.
$ echo $status
1

However, I thought we agreed that mpirun --host a,b a.out would run 2*num_cores procs, not 2 procs:

$ mpirun --host mpi009,mpi010 hostname
mpi009
mpi010
$ mpirun --host mpi009,mpi010,mpi011 hostname
mpi009
mpi010
mpi011

That's what it did in my test above, at least (run N*num_cores copies, not N copies). It's subjective, I suppose -- either way could be "correct". What did we do in previous versions?

@rhc54
Copy link
Contributor Author

rhc54 commented Feb 12, 2016

I honestly don't remember - how about we declare it "good enough"? All these corner case conditions are already making the code pretty complex and I'd hate to keep adding to it.

@rhc54
Copy link
Contributor Author

rhc54 commented Feb 13, 2016

@jsquyres okay, I surrender - updated to address your last comment.

@jsquyres
Copy link
Member

Be aware that I made that last comment based on our last phone conversation where we discussed what OMPI did in prior versions. ...but I think both our memories of prior OMPI versions were faulty (which should probably be no surprise!).

I ran tests this morning going back to v1.6.0 to see what we've been doing with --host and --hostfile over the years. Here's a spreadsheet showing the behaviors.

It turns out that this PR is currently changing a lot of existing behavior -- probably much more than we want it to.

@rhc54 I know you're totally sick of this 😦 but would it be ok if we talk through the spreadsheet and come up with some minimal change compared to v1.10.2? Sorry!!

@jsquyres
Copy link
Member

Oops -- the google spreadsheet is now publicly accessible.

@rhc54
Copy link
Contributor Author

rhc54 commented Feb 13, 2016

thx - just to be clear: I am in no way proposing that this come over to the 1.10 series. This is a PR for master only. For 1.10, I'll just backport the change that set the exit status. I see no reason to change the behavior of -host in that series.

@rhc54
Copy link
Contributor Author

rhc54 commented Feb 13, 2016

@jsquyres okay, i looked at the spreadsheet. I honestly don't have time to address all those proposed changes yet again. The logic is getting impenetrable to keep handling all the edge cases, and I'm out of time. So this isn't happening in the near future.

@jsquyres
Copy link
Member

@rhc54 Ok, fair enough. Just to be clear, though, I was trying to say that it might be easier to ditch the PR, because it seems to have gone too far. The actual desired changes (compared to master and v2.x) are actually much smaller.

I added 2 columns at the end to show current git master and current v2.x behavior. These hopefully make it clearer that we're actually quite close to the desired behavior.

That being said:

  • Current master behaves like v1.10.2
  • Current v2.x behaves like v1.10.1

I'm mostly concerned about the fact that we're basically introducing new flip-flopping --host behavior from the user's perspective:

  1. v1.10.1 behaves in manner A
  2. v1.10.2 behaves in manner B
  3. v2.0.0 behaves in manner A (like 1.10.1)
  4. v3.0.0 behaves in manner B (like v1.10.2)

@rhc54
Copy link
Contributor Author

rhc54 commented Feb 13, 2016

which is why I was sighing at the very beginning of this mess...frankly, this entire -host/-hostfile behavior thing has been a nightmare from the beginning as the community cannot seem to consistently agree on what it should do

@jsquyres
Copy link
Member

We're going to discuss this at the Dallas developer meeting next week.

@jsquyres
Copy link
Member

Per discussion in Dallas, for 2.0.0:

  • If mpirun CLI options require deeper node topology info (e.g., --npersocket), then ask PMIX for the node topology info. If PMIX cannot provide that info, launch the DVM to go fetch that info manually. Otherwise, no need to launch the DVM (i.e., no need to launch orteds on nodes where we will not be launching).
  • If user does not provide the number of processes (e.g., via -np, -npernode, ...etc.), mpirun errors. Exception: if sequential mapper is being used
  • New mpirun option (specific name TBD): --fillmeup that will automatically launch one process per slot.

@rhc54 rhc54 added this to the v2.0.0 milestone Feb 25, 2016
@rhc54
Copy link
Contributor Author

rhc54 commented Feb 25, 2016

@jsquyres I think this follows what we decided - please check.

@rhc54
Copy link
Contributor Author

rhc54 commented Feb 25, 2016

@jsquyres NOTE: a custom patch will be required for v2.x - this change will not cleanly apply

@jsquyres
Copy link
Member

jsquyres commented Mar 7, 2016

It's very, very close. I only found one discrepancy:

  • mpirun --hostfile myhostfile -np 1000 hostname
  • The myhostfile lists N servers, without a slots clause on each.
  • In my test, -np 1000 is definitely an oversubscription. However, mpirun invokes 1000 copies of hostname -- it doesn't complain about oversubscription.

@rhc54
Copy link
Contributor Author

rhc54 commented Mar 7, 2016

So I assume you have less than 1000 entries in that hostfile, none of which have specified slots, yes? And the totality of the number of cores across all entries is less than 1000?

@jsquyres
Copy link
Member

jsquyres commented Mar 7, 2016

Correct: I have 2 nodes listed in my hostfile, and slots is not specified for either of them. I expected that the total number of auto-detected slots (i.e., number of cores) should be 32 in this case. Hence, if I np of anything more than 32, I expected to get an oversubscription.

@jsquyres
Copy link
Member

jsquyres commented Mar 8, 2016

bot:retest

@jsquyres
Copy link
Member

jsquyres commented Mar 8, 2016

@rhc54 I'm sorry, but the behavior with these latest commits changed quite a bit compared to my tests yesterday afternoon. 😦

  1. mpirun --host foo hostname: does not complain if you don't specify -np. It launches a single process.
  2. mpirun --host foo,bar hostname: also does not complain if you don't specify -np. It launches 2 copies of hostname.

...I didn't test other combinations, because I think these changes mean that some unintended changes crept in...?

FYI: I'm going off column J in the google spreadsheet as what we decided in Dallas. Please correct me if that's wrong...

@rhc54
Copy link
Contributor Author

rhc54 commented Mar 8, 2016

I quit - this will have to wait. I would suggest unblocking 2.0 for it.

@hppritcha
Copy link
Member

Okay. We should document differences from 1.10. I'll open an issue to track.

@rhc54
Copy link
Contributor Author

rhc54 commented Mar 26, 2016

I revised the PR description to more accurately reflect where we wound up:

Per the discussion on the telecon, change the -host behavior so we only run one instance if no slots were provided and the user didn't specify #procs to run. However, if no slots are given and the user does specify #procs, then let the number of slots default to the #found processing elements

Ensure the returned exit status is non-zero if we fail to map

If no -np is given, but either -host and/or -hostfile was given, then error out with a message telling the user that this combination is not supported.

If -np is given, and -host is given with only one instance of each host, then default the #slots to the detected #pe's and enforce oversubscription rules.

If -np is given, and -host is given with more than one instance of a given host, then set the #slots for that host to the number of times it was given and enforce oversubscription rules. Alternatively, the #slots can be specified via "-host foo:N". I therefore believe that row 7 on Jeff's spreadsheet is incorrect.

With that one correction, this now passes all the given use-cases on that spreadsheet.

@jsquyres
Copy link
Member

Per conversation on the call today (29 Mar 2016), we updated column J a bit in the spreadsheet. The goal is to make --host and --hostfile behave exactly the same way as you would under a resource manager. Meaning:

  • If the number of slots are specified (via --host foo:N or --host foo,foo[,foo...] or in a hostfile via slots=N), then if -np is not specified, run exactly num_slots processes.
  • If -host foo is specified, or host is specified in a hostfile (with no slots=N qualifier), then we assume we don't know how many slots are intended, and -np is required

…ly run one instance if no slots were provided and the user didn't specify #procs to run. However, if no slots are given and the user does specify #procs, then let the number of slots default to the #found processing elements

Ensure the returned exit status is non-zero if we fail to map

If no -np is given, but either -host and/or -hostfile was given, then error out with a message telling the user that this combination is not supported.

If -np is given, and -host is given with only one instance of each host, then default the #slots to the detected #pe's and enforce oversubscription rules.

If -np is given, and -host is given with more than one instance of a given host, then set the #slots for that host to the number of times it was given and enforce oversubscription rules. Alternatively, the #slots can be specified via "-host foo:N". I therefore believe that row #7 on Jeff's spreadsheet is incorrect.

With that one correction, this now passes all the given use-cases on that spreadsheet.

Make things behave under unmanaged allocations more like their managed cousins - if the #slots is given, then no-np shall fill things up.

Fixes #1344
@rhc54
Copy link
Contributor Author

rhc54 commented Apr 4, 2016

Let's bring this in so folks can decide if they like it or not.

@rhc54 rhc54 merged commit a95de6e into open-mpi:master Apr 4, 2016
@rhc54 rhc54 deleted the topic/host branch April 4, 2016 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants