Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework how Open MPI integrates with 3rd party packages #8069

Merged
merged 7 commits into from
Oct 1, 2020

Conversation

bwbarrett
Copy link
Member

With Open MPI 5.0, the decision was made to stop building 3rd-party packages, such as Libevent, HWLOC, PMIx, and PRRTE as MCA components and instead 1) start relying on external libraries whenever possible and 2) Open MPI builds the 3rd party libraries (if needed) as independent libraries, rather than linked into libopen-pal. This patch series implements this behavior.

@bwbarrett bwbarrett added this to the v5.0.0 milestone Sep 29, 2020
@bwbarrett bwbarrett requested review from jsquyres and rhc54 September 29, 2020 03:04
@bwbarrett
Copy link
Member Author

This final patch, "libevent: Remove opal prefixes", is completely optional. It does clean up the code a little bit, but could also be done later as a separate patch series.

@ibm-ompi
Copy link

The IBM CI (GNU/Scale) build failed! Please review the log, linked below.

Gist: https://gist.github.com/cadc75c5056b606d2c202d2a52726e1d

@ibm-ompi
Copy link

The IBM CI (XL) build failed! Please review the log, linked below.

Gist: https://gist.github.com/2f1d0c31e87e35436221c743753866fd

@ibm-ompi
Copy link

The IBM CI (PGI) build failed! Please review the log, linked below.

Gist: https://gist.github.com/6a1ca3272fce404f150268568dedba47

@ibm-ompi
Copy link

The IBM CI (PGI) build failed! Please review the log, linked below.

Gist: https://gist.github.com/51a040d9a45c163ef66c5e803ea04b55

@bwbarrett bwbarrett force-pushed the feature/3rdparty-packaging branch from db16c2d to 2f01acf Compare September 30, 2020 23:33
At the end of "make install", a tool is run to search for common
symbols in the built artifacts, to work around issues on MacOS.
This tool requires an exclude list for symbols that must be
in the common section (such as in executables instead of libraries
and because Fortran).

This commit adds the ability to exclude certain directories from
the search, such as directories that are 3rd party packages or
only contain tests/executables, which will not run into problems
on MacOS.

To simplify that change, the file search in find_common_syms was
also rewritten to use the Perl-standard File::Find package instead
of calling the find executable.  Theoretically, this should be
mildly faster, but is also significantly easier to modify.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
@bwbarrett bwbarrett force-pushed the feature/3rdparty-packaging branch from 2f01acf to a29077f Compare September 30, 2020 23:34
@bwbarrett
Copy link
Member Author

Rebase on top of master, fixed a VPATH design issue, fixed an issue with "make check" that was causing "make distcheck" to fail, and removed the libevent rename removal patch. @jsquyres pointed out that my cleanup didn't work right in the usnic code, so I want to treat that with more care.

@rhc54
Copy link
Contributor

rhc54 commented Sep 30, 2020

FWIW: I just sync'd my local clone of your branch with PMIx and PRRTE and things built/work fine. I'm going to run a local MTT on it, though I'm limited on resources.

@ibm-ompi
Copy link

The IBM CI (PGI) build failed! Please review the log, linked below.

Gist: https://gist.github.com/2e0f4d55a804580acc64399374784f2b

@rhc54
Copy link
Contributor

rhc54 commented Sep 30, 2020

Looks like you are still hitting the platform file propagation problem - PMIx doesn't have the "mellanox" platform file

@rhc54
Copy link
Contributor

rhc54 commented Sep 30, 2020

@bwbarrett You know, there is zero value in the platform option for PMIx or PRRTE any more. Would it help if we simply removed them?

@rhc54
Copy link
Contributor

rhc54 commented Oct 1, 2020

FWIW: libevent spews hundreds of warning messages with these latest changes, all due to poor matching of data types with format options in print statements. 😦

@rhc54
Copy link
Contributor

rhc54 commented Oct 1, 2020

Local run of the IBM test suite looks good (except for one-sided, which totally failed on my Mac). Let me know if you agree about removing the platform options, and if you want me to push the PMIx/PRRTE updates to your branch.

@bwbarrett
Copy link
Member Author

@rhc54 why do you say that there's no value in the platform file for PMIx/PRRTE?

@rhc54
Copy link
Contributor

rhc54 commented Oct 1, 2020

Perhaps I should say "not that much value". It is true that we are adding more things that have dependencies (fabric, storage), so maybe it will change enough over time. If you can support it, then that would be best - if not, we can find a way to live with it. If nothing else, we could change the name of the option so it doesn't pickup the top-level one!

@bwbarrett bwbarrett force-pushed the feature/3rdparty-packaging branch from a29077f to d51d771 Compare October 1, 2020 03:06
@bwbarrett
Copy link
Member Author

@rhc54 For now, I just prevent the --with-platform argument from being passed to PMIx and PRRTE. We can revisit that later, if we decide it is important.

Also fixed a PRRTE linking issue on recent GNU linkers (like that found on Ubuntu 20.04).

Still need to figure out what is going on with the build flags and PGI compilers.

@ibm-ompi
Copy link

ibm-ompi commented Oct 1, 2020

The IBM CI (PGI) build failed! Please review the log, linked below.

Gist: https://gist.github.com/86efd7bff75cc755ce36da2a5592cf2b

@awlauria
Copy link
Contributor

awlauria commented Oct 1, 2020

Does this resolve #7560 ?

@bwbarrett bwbarrett force-pushed the feature/3rdparty-packaging branch 2 times, most recently from 78f5890 to 3eff915 Compare October 1, 2020 15:17
@bwbarrett
Copy link
Member Author

I think I found the PGI issue. PGI fakes being GCC, so the Autoconf test for GCC returns true. Libevent sets a ton of warning flags by default if the compiler is a GCC variant, many of which PGI does not recognize. The simple solution (since we are not developing Libevent itself) was to pass the Libevent configure flag that disables adding all the warning flags.

@rhc54
Copy link
Contributor

rhc54 commented Oct 1, 2020

I changed the platform options on both PMIx and PRRTE, so no worries there any more.

@rhc54
Copy link
Contributor

rhc54 commented Oct 1, 2020

Both are in a good place right now if you want to update your submodules

@bwbarrett
Copy link
Member Author

bot:aws:retest

It looks like Jenkins crashed on one of the builders.

@bwbarrett
Copy link
Member Author

@rhc54 Let's get this commit in with the current submodule pointers, and then we can follow up with another bump. Trying to get this over the line :).

Copy link
Contributor

@rhc54 rhc54 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nit and can wait, but I would suggest moving the "base" code of the hwloc and pmix frameworks into the "opal/util" directory like you did for libevent to avoid confusion. Can be done later if you prefer - I just think it will confuse people down the line to see a "defunct" framework for those two projects.

Otherwise, it looked good. Minor changes to the "platform" file names for PRRTE and PMIx.

[AC_HELP_STRING([--with-prrte-platform],
[Platform file to use when building the internal PRRTE runtime support])])
# TODO: BWB: Check with Ralph if we need seperate files for OMPI and PRTE (which seems annoying)
dnl AC_ARG_WITH([prrte-platform],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this option with only one 'r' - i.e., "prte"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the project PRRTE or PRTE? I'd like to be consistent through the whole system. In terms of the actual argument, Open MPI will pass all arguments not explicitly called out down to PRTE. So we don't actually have to do anything here; customers can do ./configure --with-platform=<foo> --with-prte-platform=<bar> and the right things will happen.

But I should remove this code, since it isn't needed at this point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The project is "PRRTE", but the commands and variables are all with a single "r". I don't know how we want to expose the PRRTE and PMIx configure options at the user level for OMPI - I guess that is an open question? I grok that the right thing will happen - just not sure how the user can see what options exist.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure there's a great answer here, other than eventually getting ./configure --help=recursive to work properly. I should figure out why that doesn't. Otherwise, we end up exposing way too many options (like why is --with-platform special compared to all the others?).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree - I think we can set it aside for now as it may become too complex and confusing to the user. Perhaps all we need do at some point is add a comment to configure help indicating that these subprojects are included and direct them on how to find configure options specific to each subproject.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the commented out platform argument from the top level. Also figured out how to make ./configure --help=recursive work correctly, which I think is the answer to your question about how customers find arguments for sub-projects.

PAC_CONFIG_SUBDIR_ARGS([3rd-party/openpmix], [$internal_pmix_args],
[[--with-libevent=internal], [--with-hwloc=internal],
[--with-libevent=external], [--with-hwloc=external],
[--with-pmix=.*], [--with-platform=.*]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PMIx is now looking for --with-pmix-platform

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup; we don't actually have to do anything here. The 3rd argument is an array of options that PAC_CONFIG_SUBDIR_ARGS will strip out from the top-level configure options when passing down to the sub-configure. This will avoid passing the --with-platform argument to PMIx, as belt and suspenders type thing.

[AC_MSG_CHECKING([if external PMIx version is 3.0.0 or greater])
AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[#include <pmix_version.h>]],
[[
#if PMIX_VERSION_MAJOR < 3L
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure OMPI (setting aside PRRTE) can support v3.0.0 - it may well require at least v3.1.5. I wouldn't hold this PR over it, but is something we should check prior to release.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call out; I think testing is the right path. 3.0.0 was what the external component had been checking for, which is why I left it.

@bwbarrett
Copy link
Member Author

This is a nit and can wait, but I would suggest moving the "base" code of the hwloc and pmix frameworks into the "opal/util" directory like you did for libevent to avoid confusion. Can be done later if you prefer - I just think it will confuse people down the line to see a "defunct" framework for those two projects.

Otherwise, it looked good. Minor changes to the "platform" file names for PRRTE and PMIx.

I tried to pull out the HWLOC and PMIx code into util. It was a big project and I gave up. We can always do it later. I left some notes in the mca// directories as a result.

With Open MPI 5.0, the decision was made to stop building 3rd-party
packages, such as Libevent, HWLOC, PMIx, and PRRTE as MCA components
and instead 1) start relying on external libraries whenever possible
and 2) Open MPI builds the 3rd party libraries (if needed) as
independent libraries, rather than linked into libopen-pal.

This patch is the first step in that process, providing foundational
changes required for supporting 3rd-party packages, such as changes
to autogen.pl, the top-level Makefile.am, and introducing two
Autoconf macros to support running sub-configure scripts; one
supporting source in tarball form and the other supporting
source in a sub-tree.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
With Open MPI 5.0, the decision was made to stop building
3rd-party packages, such as Libevent, HWLOC, PMIx, and PRRTE as
MCA components and instead 1) start relying on external libraries
whenever possible and 2) Open MPI builds the 3rd party
libraries (if needed) as independent libraries, rather than
linked into libopen-pal.

This patch moves libevent from an MCA framework to a stand-alone
library built outside of OPAL.  A wrapper in opal/util is provided
to minimize the unnecessary changes in the rest of the code.  When
using the internal Libevent, it will be installed as a stand-alone
libevent.a, instead of bundled in OPAL.  Any pre-installed version
of Libevent at or after 2.0.21 is preferred over the internal
version.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
With Open MPI 5.0, the decision was made to stop building
3rd-party packages, such as Libevent, HWLOC, PMIx, and PRRTE as
MCA components and instead 1) start relying on external libraries
whenever possible and 2) Open MPI builds the 3rd party
libraries (if needed) as independent libraries, rather than
linked into libopen-pal.

This patch moves the hwloc library bundled with Open MPI from a
MCA framework to a stand-alone library built outside of OPAL.  Due
to the amount of code in the MCA base (and its assumptions about
being part of an MCA framework), the framework is left with no
active components.  Any pre-installed version of HWLOC 1.6 or
newer is preferred over the internal version.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
With Open MPI 5.0, the decision was made to stop building
3rd-party packages, such as Libevent, HWLOC, PMIx, and PRRTE as
MCA components and instead 1) start relying on external libraries
whenever possible and 2) Open MPI builds the 3rd party
libraries (if needed) as independent libraries, rather than
linked into libopen-pal.

This patch moves the PMIx library bundled with Open MPI from a
MCA framework to a stand-alone library built outside of OPAL.  Due
to the amount of code in the MCA base (and its assumptions about
being part of an MCA framework), the framework is left with no
active components.  Any pre-installed version of PMIx 3.0.0 or
newer is preferred over the internal version.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
With Open MPI 5.0, the decision was made to stop building
3rd-party packages, such as Libevent, HWLOC, PMIx, and PRRTE as
MCA components and instead 1) start relying on external libraries
whenever possible and 2) Open MPI builds the 3rd party
libraries (if needed) as independent libraries, rather than
linked into libopen-pal.

This patch moves the prrte submodule from the top-level to the
3rd-party directory, to match the behavior of other 3rd-party
packages like Libevent and PMIx.  Since Open MPI does not
support building with an external PRRTE, that functionality
is skipped in this patch.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
The refactoring patches move Libevent from a framework integration
to a 3rd-party package, but did not change the Libevent version
that Open MPI ships.  During that swap, we stopped running the
Autotools on Libevent and relied on the tools the Libevent authors
used when building the 2.0.22 release tarball.  The config.guess
in this release tarball did not work on the IBM systems.

This patch updates the release version of Libevent to 2.1.12-stable,
which will suck in a bunch of upstream bug fixes and updates
the config.guess so that the 3rd-party refactoring actually
compiles on the IBM Power systems.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
@bwbarrett bwbarrett force-pushed the feature/3rdparty-packaging branch from 3eff915 to a30eb69 Compare October 1, 2020 16:56
@bwbarrett
Copy link
Member Author

Removed an un-needed todo that Ralph found and fixed ./configure --help=recursive so that sub-configure options could be found.

Copy link
Contributor

@rhc54 rhc54 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks go9od - thx!

@bwbarrett bwbarrett merged commit 7592eb3 into open-mpi:master Oct 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants