-
Notifications
You must be signed in to change notification settings - Fork 875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove ORTE project #7202
Remove ORTE project #7202
Conversation
@jsquyres @bwbarrett @jjhursey @naughtont3 @ggouaillardet This is the first step in the move to replace ORTE with PRRTE. It removes the ORTE layer completely, ensures all interaction with the RTE is via PMIx, and makes the opal/pmix framework build as a static framework. See what you think Next step will be to introduce PRRTE as a submodule and reintegrate it into this branch. |
@rhc54 I pushed a new commit that fixes misc stuff. |
it looks like some extra work for |
@ggouaillardet Can you see if this now works when running a simple "hello" program? I believe you need to configure both PRRTE and OMPI as "static" to avoid library confusion, but it should then work. |
@rhc54 I pushed some more misc orte->prrte related fixes. All my jobs failed in |
@rhc54 I started to do a quick test of this PR. It looks like this is using updated version of
|
I believe that change was ported back to the v3.1 branch, but v3.1.5 has not yet been released. It shouldn't be flagged as an error, however, but rather just as a warning - I gather you have "treat warnings as errors" set and that is the issue. |
@jsquyres @ggouaillardet I got autopen.pl to traverse prrte, but I am not sure how to get OMPI to execute configure in that subdirectory. We also need to figure out how to direct PRRTE to use the correct libevent, hwloc, and pmix. Can you guys take a look? |
bot:retest |
@bwbarrett @wckzhang One of the Build Checker tests failed to pull the git submodule - can you see why? |
bot:mellanox:retest |
@artemry-mlnx Same here - can we build with --enable-debug? |
@jjhursey @artemry-mlnx I tracked it down - fix coming shortly |
Our CI doesn't check it, but I have ensured that singleton operations continue to work. However, singleton Comm_spawn does not currently work unless there is a PMIx-enabled RTE (e.g., PRRTE) operating on the node. |
If we could get an eventual feature request to fail gracefully -- i.e., with a helpful / show_help-style error message telling the user that COMM_SPAWN failed because they need to run a PMIx-enabled RTE yadda yadda yadda -- that would be awesome. 👍 |
I hope to eventually get it to work again - we need to detect that we are not yet connected and spawn PRRTE. Tricky part will be getting the client to "reconnect" to the newly spawned PRRTE. We have thought of this already, but haven't implemented it yet (though the hook is present). |
@artemry-mlnx Could you please tell me what is in your "test-amca.conf" file? We seem to be failing your test of the "--tune" option, but I can't see what is in the file and don't know how it is formatted. |
bot:ibm:retest |
@rhc54
|
@artemry-mlnx I'm running out of patience playing "whackamole" with the Mellanox CI. I need to know what is in this file:
If you look at your log, it is at the end where it fails. I have tried multiple times with different file contents, and everything works. I don't know what is in that one. Can you please help? |
@artemry-mlnx Perhaps it would help if you shared your test script - you have things in it that are no longer being supported. For example:
If you provide the script, I'll be happy to update it and return it to you. Meantime, I really do need to know the contents of these "tune" files you are testing. Alternatively, if you folks don't have time right now, I'm happy to go ahead and commit and we can address these rather unusual options later. |
@rhc54 |
Will be replaced by PRRTE. Ensure that OMPI and OPAL layers build without reference to ORTE. Setup opal/pmix framework to be static. Remove support for all PMI-1 and PMI-2 libraries. Add support for "external" pmix component as well as internal v4 one. remove orte: misc fixes - UCX fixes - VPATH issue - oshmem fixes - remove useless definition - Add PRRTE submodule - Get autogen.pl to traverse PRRTE submodule - Remove stale orcm reference - Configure embedded PRRTE - Correctly pass the prefix to PRRTE - Correctly set the OMPI_WANT_PRRTE am_conditional - Move prrte configuration to the end of OMPI's configure.ac - Make mpirun a symlink to prun, when available - Fix makedist with --no-orte/--no-prrte option - Add a `--no-prrte` option which is the same as the legacy `--no-orte` option. - Remove embedded PMIx tarball. Replace it with new submodule pointing to OpenPMIx master repo's master branch - Some cleanup in PRRTE integration and add config summary entry - Correctly set the hostname - Fix locality - Fix singleton operations - Fix support for "tune" and "am" options Signed-off-by: Ralph Castain <rhc@pmix.org> Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
@rhc54 will you merge it into |
No - strictly for OMPI 5 and above. |
The test has been updated for open-mpi/ompi#7202 in scope of mellanox-hpc#92. PR 7202 is not ported to Open MPI 4.0.x. Signed-off-by: Artem Ryabov <artemry@mellanox.com>
Will be replaced by PRRTE. Ensure that OMPI and OPAL layers build
without reference to ORTE. Setup opal/pmix framework to be static.
Remove support for all PMI-1 and PMI-2 libraries. Add support for
"external" pmix component as well as internal v4 one.
remove orte: misc fixes
--no-prrte
option which is the same as the legacy--no-orte
option.pointing to OpenPMIx master repo's master branch
Signed-off-by: Ralph Castain rhc@pmix.org
Signed-off-by: Gilles Gouaillardet gilles@rist.or.jp
Signed-off-by: Joshua Hursey jhursey@us.ibm.com