Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPICH support enhancements #398

Closed
trws opened this issue Sep 10, 2015 · 35 comments
Closed

MPICH support enhancements #398

trws opened this issue Sep 10, 2015 · 35 comments

Comments

@trws
Copy link
Member

trws commented Sep 10, 2015

There have been a number of tickets on this topic, including #221 and others on the actual PMI API support. To try and get an MPI perspective, I snagged Pavan today to get a rundown of how MPICH expects to interact with PMIs and resource managers so hopefully we can make this as painless as possible for users (and hopefully us too).

Our current solution of providing a PMI library at the API level certainly works, but is not and will not be binary compatible with MPICH/MVAPICH/Intel MPI/etc. because they build with "simple" PMI by default (see source here. Using LD_PRELOAD or the like and building MPICH to suit are fine for testing, but being able to support the wire protocol would be a nice add-in, and make it completely native to just use flux run mpich-based-thing without any extra user work.

This would mean providing a socket that offers that wire protocol, probably in a module or launcher plugin, which maps pretty much one-to-one onto the PMI API.

The other thing that came up was the option of supporting flux as a hydra target, which they appear to be interested in. Requests for this:

  • A command to run jobs on arbitrary nodes/cores in the allocation or to request a set of resources
  • A nodelist, preferably as a file rather than a condensed node range-list in the environment
@garlick
Copy link
Member

garlick commented Sep 10, 2015

I had run across this design document on the wire protocol. If there are other specific code/document references it would be good to have them here.

I wonder if any of the versions of MPI that we support in production have this capability or if it is possible to graft this into the older versions?

@trws
Copy link
Member Author

trws commented Sep 11, 2015

I had seen that, but didn't link it because I'm not sure that's entirely in line with the actual implementation in use. The 2.0 interface wasn't stabilized when that went up, I'll ask Pavan about docs for it.

As to older versions, I'm not sure how far back it goes precisely, but this was the default mechanism for all mpich derivatives as of at least 7 years ago, and should be available in all production builds unless explicitly removed at configuration time. It's the mechanism that Hydra, and MPD before it, use to bootstrap a job, so it's carefully supported. MPICH1 probably won't take it though unfortunately, I think PMI came in after development stopped on that branch.

@trws
Copy link
Member Author

trws commented Sep 11, 2015

Hmm... it looks like we build mvapich with pmgr and no pm option for some reason.

$ mpich2version
MPICH2 Version:     1.7
MPICH2 Release date:    Thu Oct 13 17:31:44 EDT 2011
MPICH2 Device:      ch3:psm
MPICH2 configure:   --prefix=/usr/local/tools/mvapich2-gnu-1.7 --enable-f77 --enable-fc --enable-cxx --enable-fast=O2 --enable-g=dbg --enable-error-checking=runtime --enable-error-messages=all --enable-nmpi-as-mpi --enable-shared --enable-sharedlibs=gcc --enable-debuginfo --with-pm=no --with-pmi=pmgr --with-pmgr=/usr/local/tools/pmgr --with-device=ch3:psm --disable-registration-cache --enable-romio --with-file-system=lustre+nfs+ufs --disable-mpe --without-mpe
MPICH2 CC:  gcc   -g -O2  -g -O2
MPICH2 CXX:     g++   -g -O2 -g -O2
MPICH2 F77:     gfortran  -g -O2 -fno-second-underscore -g -O2
MPICH2 FC:  gfortran  -g -O2 -fno-second-underscore -g -O2

In this configuration it assumes pmgr bootstrap rather than simple PMI, which I suppose ties back to #221. In principle, we could actually support all of these off of the same backend. Normally that's not possible, because there's no way to negotiate, but since we control the daemon and they all take a socket or specified connector, we can disambiguate based on which the job decides to use. The one mpich2 we have in dotkit is configured to use simple, I didn't check the others, but Intel MPI should also be using one of the two.

@trws
Copy link
Member Author

trws commented Sep 16, 2015

Funny meta-thought on this. We'll need a bootstrap system for running an initial flux instance across systems that don't have a resource manager set up at some point, and it occurs to me that the rsh-tree hydra backend may be a convenient way to do that until we have something of our own.

@garlick
Copy link
Member

garlick commented Sep 18, 2015

@trws I could not decipher the Argonne end of the call this morning due to the bad audio. Would you mind summarizing the next steps here? (Thanks)

@trws
Copy link
Member Author

trws commented Sep 18, 2015

Thanks for the reminder @garlick, I meant to do this earlier, but had a meeting with Robin about using flux/capacitor for some of the high-throughput users they've been seeing on Catalyst and it slipped out of cache. One or two feature issues will be popping up shortly from that.

next-steps:

  • Argonne:
    • will produce some documentation for the PMI "simple" v1 wire protocol
    • may keep us in the loop on the development of the PMI v3 protocol (pre-production)
  • LLNL:
    • will produce either:
      • a file per job with PBS nodefile semantics OR
      • a command hydra can run to produce said file from the data in the KVS (this is my preference, flag on topo or lstopo perhaps?)

Further, as discussed earlier in this issue, the official recommendation from Pavan et. al. on supporting PMI in mpich-derivatives is to support the PMI "simple" v1 wire protocol. Secondarily the PMI API, as we're doing now also works but depends heavily on library path and configuration control.

@grondo
Copy link
Contributor

grondo commented Sep 18, 2015

a command hydra can run to produce said file

@trws, For now flux exec hostname > file might work.
Is there documentation on "PBS nodefile semantics"?

I was thinking on the call that if we want to optionally produce a nodefile, this is something that could be done in an initial program...

@trws
Copy link
Member Author

trws commented Sep 18, 2015

It's a bit more than just a list of hostnames. The general format is either hostname replicated once per core to be used on that host, or hostname:num-cores iirc. SchedMD apparently offers a generate_pbs_nodefile for slurm now that you can see here that does it the first way. It can also carry some extra information to hydra, such as which precise cores to use for binding purposes, but that's the gist of it.

Anyway, I'm not quite sure I understand what you mean by using an initial program in place of a command. It is something that could be done entirely in post-processing, a basic one could just use your lua-hostlist to do an expand and replace spaces with newlines and it would be almost there, but since we may want to provide specific information based on the way the job is allocated, it seemed like it might be useful to have it in our control.

@grondo
Copy link
Contributor

grondo commented Sep 18, 2015

Ok, I was thinking it was just a list of hostnames (e.g. SLURM_NODELIST).

I was thinking a per instance hostlist could be generated optionally via an rc-like scriptlet once we had an initial program structure that could accomplish that. Sounds like you'd need to generate possibly a different one per mpiexec invocation so that won't really work here.

@garlick
Copy link
Member

garlick commented Sep 18, 2015

Are we expecting Flux to launch a hydra-bootstrapped MPI program directly, or through a sub instance?

@trws
Copy link
Member Author

trws commented Sep 18, 2015

Sub instance. It's just so someone could run mpiexec inside a flux instance and have it work as expected. The user would do something like:

flux run -N 4
#inside flux now
mpiexec --hydra-binding-option-or-something mpi_job

The mpiexec then would use flux under the covers to bootstrap itself, then bootstrap the job. It's an extra level of portability is all, since a lot of people have batch scripts that use hydra options for rank distribution and binding options, which we then all of a sudden support without much extra effort.

The nodelist would be the same for the lifetime of an instance, so it could be done with the initial program infrastructure as @grondo mentioned. I had just been thinking that we might not want to have to do that for every job, and having hydra request it from us seemed like a reasonable way to only pay for it when they actually need it without forcing the user to request it.

@grondo
Copy link
Contributor

grondo commented Sep 18, 2015

@trws, I definitely prefer if hydra requests the nodelist only when it needs it. I wonder if they've thought of having the code for getting the nodelist and launching their daemons dynamically loadable? Then RMs could provide their own hooks for this stuff instead of adding it directly to hydra codebase (maybe they already do this, I don't know)

On the initial program thing I was getting ahead of myself there. I was thinking eventually we'd have an initial program framework with unit or rc files that were activated by options somehow passed to the script itself, and to get hydra working you'd have to enable the option. Obviously hydra directly requesting the file is much better since it will work without an extra option.

@trws
Copy link
Member Author

trws commented Sep 18, 2015

@grondo that did actually come up in the conversation at Cluster, though I had forgotten it until you mentioned it. Hydra uses dynamic loading to pick up libraries for this kind of thing from some versions of PBS descendants, but frequently runs into problems with installations using the wrong system because they don't find the library the user thought it would find. It's the same issue we're having with getting PMI to behave sanely. Having a way to generically hook in would certainly be good, but as it is hydra is set up to work by specializing itself based on what it finds on the target.

In our case it would probably look for FLUX_* environment variables and/or an executable flux command in $PATH. That's what made me think of using a command, if it can see flux to pick it up, then we know for sure it can run a command, where it may not find a library in $(dirname $(which flux))/lib as that may not be in LD_LIBRARY_PATH, or may be too late in the path, or -rpath may be set wrong, or the so might have a different version number, etc.

@trws
Copy link
Member Author

trws commented Sep 18, 2015

I should say, I suggested offering a library call, and was explicitly asked for something else in that conversation. Badly configured library setups apparently account for a non-trivial percentage of their support volume.

@garlick
Copy link
Member

garlick commented Sep 18, 2015

So is the idea that mpiexec will use our scalable program launch to start the hydra daemons? (sorry, I know you asked that this morning but I either forgot or didn't hear the answer)

@trws
Copy link
Member Author

trws commented Sep 18, 2015

Only for the case where hydra is being used as the bootstrapper, which should only be for users who have pre-existing hydra/mpiexec-based batch scripts they want to keep using. It's just the easier way for them to set up hydra support for flux in the short term, since it already has its own overlay etc. As I think I mentioned earlier though, the idea is really to support two models so that a user can do either of these and expect them to work:

#in a context with a flux instance serving as resource manager
mpiexec-hydra mpich_based_program
flux run mpich_based_program

The hydra daemons would get launched in the first case, so they can provide support for any and all options that version of mpiexec accepts and making porting a code from say argonne to livermore a bit easier. The second would use our launch to run the job directly and provide flux-based PMI support or other for bootstrapping, no hydra involvement of any kind.

@garlick
Copy link
Member

garlick commented Sep 18, 2015

Understood - I meant to ask if in the first case, mpiexec is using rsh or equivalent to start its daemons on the hosts in the nodelist, or if flux would be launching them as a parallel program.

@trws
Copy link
Member Author

trws commented Sep 18, 2015

Sorry, I wasn't sure.

It can technically do either, but normally it uses whatever the native mechanism is since that tends to be rather faster than doing an rsh-tree on the raw nodes. In our case, the biggest part of supporting flux in hydra is having it use our job-launch facilities, like it uses srun in slurm currently. That's just step two, and something we didn't talk about on the call today, but we should decide on what interface we want to give them for that. Since hydra expects raw execution, flux exec might actually be the way to go, I don't really think it needs any more than that as long as exec can run on a specific target node.

@grondo
Copy link
Contributor

grondo commented Sep 24, 2015

Just a follow up note here on Intel MPI. A discussion on slurm-dev mail list revealed this reference that implies that you can switch the Intel MPI PMI library at runtime by setting I_MPI_PMI_LIBRARY=/path/to/libpmi.so in your environment.

@grondo
Copy link
Contributor

grondo commented Sep 25, 2015

FYI -- more info about OpenMPI PMIx and mvapich PMIX from recent discussion on slurm mail list:

https://groups.google.com/forum/#!topic/slurm-devel/eCO9gBmTsTg

@trws
Copy link
Member Author

trws commented Sep 25, 2015

Ugh... mvapich pmix... why...

Well, the list to support everyone "natively" now looks like:

  • mvapich1: pmgr
  • mpich2 derivatives: pmi simple wire protocol, or api override at build time, eventually pmi3 probably
  • openmpi: PMIx, their PMI1/2 interface actually builds on top of this, so... yeah. re-compile to non-PMIx pmi to support PMI1/2 API library

fun...

@grondo
Copy link
Contributor

grondo commented Sep 25, 2015

This might be useful later so I paste it here, the beta SLURM PMIx module:

https://github.com/artpol84/slurm/tree/pmix-step2/src/plugins/mpi/pmix

@grondo
Copy link
Contributor

grondo commented Sep 25, 2015

Oops, should my last two comments go in #365?

@garlick
Copy link
Member

garlick commented Sep 26, 2015

It looks like there are ubuntu packages for mpich. On my 14.04LTS system I was able to install

mpich-3.0.4-6ubuntu1
libmpich-dev-3.0.4-6ubuntu1

it appears to have been compiled without pmi options:

MPICH Version:      3.0.4
MPICH Release date: Wed Apr 24 10:08:10 CDT 2013
MPICH Device:       ch3:nemesis
MPICH configure:    --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --libdir=${prefix}/lib/x86_64-linux-gnu --libexecdir=${prefix}/lib/x86_64-linux-gnu --disable-maintainer-mode --disable-dependency-tracking --enable-shared --prefix=/usr --enable-fc --disable-rpath --disable-wrapper-rpath --sysconfdir=/etc/mpich --libdir=/usr/lib/x86_64-linux-gnu --includedir=/usr/include/mpich --docdir=/usr/share/doc/mpich --with-hwloc-prefix=system --enable-checkpointing --with-hydra-ckpointlib=blcr
MPICH CC:   cc -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security  -O2
MPICH CXX:  c++ -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -O2
MPICH F77:  gfortran -g -O2 -g -O2 -O2
MPICH FC:   gfortran  -g -O2 -O2

I confirmed that this version does something when I set PMI_FD in the environment of mpi_hello compiled with it, so based on simple_pmi.c I conclude it must be capable of using the simple v1 wire protocol?

$ PMI_FD=42 ./mpi_hello
[unset]: write_line error; fd=42 buf=:cmd=init pmi_version=1 pmi_subversion=1
:
system msg for write_line failure : Bad file descriptor
[unset]: Unable to write to PMI_fd
[unset]: write_line error; fd=42 buf=:cmd=get_appnum
:
system msg for write_line failure : Bad file descriptor
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(433): 
MPID_Init(139).......: channel initialization failed
MPID_Init(421).......: PMI_Get_appnum returned -1

If we're going to implement the simple wire protocol, at least there is a straightforward way to build MPI programs that use it for test.

A side note for travis-ci: on Ubuntu 12.04, the packages are mpich2, libmpich2-dev

@trws
Copy link
Member Author

trws commented Sep 26, 2015

Yeah. If you give no PMI option that's what you get. You can also give mpich derivatives multiple PMIs then select the one you want at runtime by name with another environment variable. The default one will implement both the v1 and v2 wire protocols iirc, but apparently the v2 protocol is out of favor at the moment.

OpenMPI used to be able to do something similar, but I haven't tried since this whole PMIx thing happened.

Sent with Good (www.good.com)


From: Jim Garlick
Sent: Saturday, September 26, 2015 9:10:11 AM
To: flux-framework/flux-core
Cc: Scogland, Thomas Richard William
Subject: Re: [flux-core] MPI support enhancements (#398)

It looks like there are ubuntu packages for mpich. On my 14.04LTS system I was able to install

mpich-3.0.4-6ubuntu1
libmpich-dev-3.0.4-6ubuntu1

it appears to have been compiled without pmi options:

MPICH Version: 3.0.4
MPICH Release date: Wed Apr 24 10:08:10 CDT 2013
MPICH Device: ch3:nemesis
MPICH configure: --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --libdir=${prefix}/lib/x86_64-linux-gnu --libexecdir=${prefix}/lib/x86_64-linux-gnu --disable-maintainer-mode --disable-dependency-tracking --enable-shared --prefix=/usr --enable-fc --disable-rpath --disable-wrapper-rpath --sysconfdir=/etc/mpich --libdir=/usr/lib/x86_64-linux-gnu --includedir=/usr/include/mpich --docdir=/usr/share/doc/mpich --with-hwloc-prefix=system --enable-checkpointing --with-hydra-ckpointlib=blcr
MPICH CC: cc -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -O2
MPICH CXX: c++ -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -O2
MPICH F77: gfortran -g -O2 -g -O2 -O2
MPICH FC: gfortran -g -O2 -O2

I confirmed that this version does something when I set PMI_FD in the environment of mpi_hello compiled with it, so based on simple_pmi.chttps://github.com/adk9/mpich/blob/master/src/pmi/simple/simple_pmi.c I conclude it must be capable of using the simple v1 wire protocol?

$ PMI_FD=42 ./mpi_hello
[unset]: write_line error; fd=42 buf=:cmd=init pmi_version=1 pmi_subversion=1
:
system msg for write_line failure : Bad file descriptor
[unset]: Unable to write to PMI_fd
[unset]: write_line error; fd=42 buf=:cmd=get_appnum
:
system msg for write_line failure : Bad file descriptor
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(433):
MPID_Init(139).......: channel initialization failed
MPID_Init(421).......: PMI_Get_appnum returned -1

If we're going to implement the simple wire protocol, at least there is a straightforward way to build MPI programs that use it for test.

A side note for travis-ci: on Ubuntu 12.04, the packages are mpich2, libmpich2-devhttp://packages.ubuntu.com/precise/devel/mpich2


Reply to this email directly or view it on GitHubhttps://github.com//issues/398#issuecomment-143467358.

garlick added a commit to garlick/flux-core that referenced this issue Sep 26, 2015
As discussed in flux-framework#398, as a precursor to implementing the PMI
simple v1 wire protocol, pull the mpich2 package into the travis-ci
environment in place of OpenMPI.
@garlick
Copy link
Member

garlick commented Sep 26, 2015

spelunking src/pmi/simple the wire protocol appears to consist of:

Server sets PMI_FD, PMI_SIZE, PMI_RANK, and optionally PMI_DEBUG and PMI_SPAWNED env vars to integer values for client. Client initiates protocol on the PMI_FD file descriptor when PMI_Init() is called:

PMI_Init

C: cmd=init pmi_version=1 pmi_subversion=1\n
S: cmd=response_to_init rc=0 pmi_version=1 pmi_subversion=1\n
C: cmd=get_maxes\n
S: cmd=maxes rc=0 kvsname_max=256 keylen_max=256 vallen_max=256\n

PMI_Get_universe_size

C: cmd=get_universe_size\n
S: cmd=universe_size rc=0 size=<integer>\n

PMI_Get_appnum

C: cmd=get_appnum\n
S: cmd=appnum rc=0 appnum=<integer>\n

PMI_Barrier

C: cmd=barrier_in\n
S: cmd=barrier_out rc=0\n

PMI_Finalize

C: cmd=finalize\n
S: cmd=finalize_ack rc=0\n

PMI_KVS_Get_my_name

C: cmd=get_my_kvsname\n
S: cmd=my_kvsname rc=0 kvsname=<string>\n

PMI_KVS_Put

C: cmd=put kvsname=<string> key=<string> value=<string>\n
S: cmd=put_result rc=1\n

PMI_KVS_Get

C: cmd=get kvsname=<string> key=<string>\n
S: cmd=get_result rc=0 value=<string>\n

PMI_Publish_name

C: cmd=publish_name service=<string> port=<string>\n
S: cmd=publish_result rc=0 info=ok\n

PMI_Unpublish_name

C: cmd=unpublish_name service=<string>\n
S: cmd=unpublish_result rc=0 info=ok\n

PMI_Lookup_name

C: cmd=lookup_name service=<string>\n
S: cmd=lookup_result rc=0 info=ok port=<string>\n

PMI_Spawn_multiple
This is massively more complicated than the rest. Save for another day.

That's all there is. PMI functions not listed for the most part return data from the environment variables or obtained during PMI_init().

@garlick
Copy link
Member

garlick commented Sep 26, 2015

I also verified that the ubuntu 14.04 mpich supports an alternative to file descriptor passing. Set PMI_ID and PMI_PORT=<hostname>:<port>. The client then connects to the server on PMI_PORT and runs through the following handshake:

C: cmd=initack pmiid=<integer>\n
S: cmd=intack\n
S: cmd=set size=<integer>\n
S: cmd=set rank=<integer>\n
S: cmd=set debug=<integer>\n

After which the cmd=init handshake begins as above.

@trws
Copy link
Member Author

trws commented Sep 26, 2015

The host:port option I think was the original, but is no longer used by the standard launchers. It's a relic from before MPD if I recall correctly. Not to say it shouldn't be considered if it's more convenient for some reason, but that code path may be less frequently tested these days.

Sent with Good (www.good.com)


From: Jim Garlick
Sent: Saturday, September 26, 2015 2:47:57 PM
To: flux-framework/flux-core
Cc: Scogland, Thomas Richard William
Subject: Re: [flux-core] MPI support enhancements (#398)

I also verified that the ubuntu 14.04 mpich supports an alternative to file descriptor passing. Set PMI_ID and PMI_PORT=:. The client then connects to the server on PMI_PORT and runs through the following handshake:

C: cmd=initack pmiid=\n
S: cmd=intack\n
S: cmd=set size=\n
S: cmd=set rank=\n
S: cmd=set debug=\n

After which the cmd=init handshake begins as above.


Reply to this email directly or view it on GitHubhttps://github.com//issues/398#issuecomment-143498845.

@garlick
Copy link
Member

garlick commented Sep 27, 2015

travis hasn't whiltelisted mpich yet - see travis-ci/apt-package-safelist#406

@garlick
Copy link
Member

garlick commented Sep 27, 2015

I thought it might be good anyway to have mpich built to the side in travis, for more control over pm options we want to support and therefore should test against. Got this working for gcc but there is a known bug that causes clang to segfault building mpich.

I can't reproduce this on my ubuntu 14.04 system with clang-3.4-1ubuntu3.

@grondo
Copy link
Contributor

grondo commented Sep 27, 2015

GCC is installed even in the clang builder. You can force mpich to build
with GCC by exporting CC=gcc before the build. (If you added mpich to
travis-dep-builder.sh maybe try CC=gcc src/test/travis-dep-builder.sh ...
or export directly in the script. We don't care what compiler the
dependencies are built with anyway...)

On Sun, Sep 27, 2015 at 10:59 AM, Jim Garlick notifications@github.com
wrote:

I thought it might be good anyway to have mpich built to the side in
travis, for more control over pm options we want to support and therefore
should test against. Got this working for gcc but there is a known bug
https://llvm.org/bugs/show_bug.cgi?id=24455 that causes clang to
segfault building mpich.

I can't reproduce this on my ubuntu 14.04 system with clang-3.4-1ubuntu3.


Reply to this email directly or view it on GitHub
#398 (comment)
.

@garlick
Copy link
Member

garlick commented Sep 27, 2015

Thanks @grondo that did the trick!

@garlick garlick changed the title MPI support enhancements MPICH support enhancements Sep 29, 2015
garlick added a commit to garlick/flux-core that referenced this issue Oct 5, 2015
Drop the somewhat contrived boot_pmi.c class from the broker,
and rewrite the PMI bootstrap code using pmi-client.h interfaces
directly.  I think this clarifies the code even though it is quite
verbose.

If PMI doesn't implement pmi_get_id(), derive the session-id from
the "appnum" (numerical jobid).

Don't attempt to call pmi_get_clique_ranks() unless epgm is enabled.

Neither pmi_get_id() nor pmi_get_clique_ranks() are implemented
in the "simple v1" PMI wire protocol, so allowing these functions
to be unimplemented enables Flux to be launched by mpiexec.hydra,
which addresses one goal of flux-framework#398.
garlick added a commit to garlick/flux-core that referenced this issue Oct 6, 2015
Drop the somewhat contrived boot_pmi.c class from the broker,
and rewrite the PMI bootstrap code using pmi-client.h interfaces
directly.  I think this clarifies the code even though it is quite
verbose.

If PMI doesn't implement pmi_get_id(), derive the session-id from
the "appnum" (numerical jobid).

Don't attempt to call pmi_get_clique_ranks() unless epgm is enabled.

Neither pmi_get_id() nor pmi_get_clique_ranks() are implemented
in the "simple v1" PMI wire protocol, so allowing these functions
to be unimplemented enables Flux to be launched by mpiexec.hydra,
which addresses one goal of flux-framework#398.
garlick added a commit to garlick/flux-core that referenced this issue Oct 6, 2015
Drop the somewhat contrived boot_pmi.c class from the broker,
and rewrite the PMI bootstrap code using pmi-client.h interfaces
directly.  I think this clarifies the code even though it is quite
verbose.

If PMI doesn't implement pmi_get_id(), derive the session-id from
the "appnum" (numerical jobid).

Don't attempt to call pmi_get_clique_ranks() unless epgm is enabled.

Neither pmi_get_id() nor pmi_get_clique_ranks() are implemented
in the "simple v1" PMI wire protocol, so allowing these functions
to be unimplemented enables Flux to be launched by mpiexec.hydra,
which addresses one goal of flux-framework#398.
@garlick
Copy link
Member

garlick commented Dec 30, 2016

We've have both process and process manager support for the PMI-1 simple wire protocol and can launch MPICH programs directly. We have a test case of hydra launching Flux. I think all that's left is to support mpirun-hydra under Flux, and I'm not sure that we really need that. Closing this issue. Let's open a new one focused on mpirun-hydra if we need it.

@garlick garlick closed this as completed Dec 30, 2016
@trws
Copy link
Member Author

trws commented Jan 1, 2017 via email

@garlick
Copy link
Member

garlick commented Jan 1, 2017

Thanks :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants