-
Notifications
You must be signed in to change notification settings - Fork 877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework how Open MPI integrates with 3rd party packages #8069
Rework how Open MPI integrates with 3rd party packages #8069
Conversation
This final patch, "libevent: Remove opal prefixes", is completely optional. It does clean up the code a little bit, but could also be done later as a separate patch series. |
The IBM CI (GNU/Scale) build failed! Please review the log, linked below. Gist: https://gist.github.com/cadc75c5056b606d2c202d2a52726e1d |
The IBM CI (XL) build failed! Please review the log, linked below. Gist: https://gist.github.com/2f1d0c31e87e35436221c743753866fd |
The IBM CI (PGI) build failed! Please review the log, linked below. Gist: https://gist.github.com/6a1ca3272fce404f150268568dedba47 |
The IBM CI (PGI) build failed! Please review the log, linked below. Gist: https://gist.github.com/51a040d9a45c163ef66c5e803ea04b55 |
db16c2d
to
2f01acf
Compare
At the end of "make install", a tool is run to search for common symbols in the built artifacts, to work around issues on MacOS. This tool requires an exclude list for symbols that must be in the common section (such as in executables instead of libraries and because Fortran). This commit adds the ability to exclude certain directories from the search, such as directories that are 3rd party packages or only contain tests/executables, which will not run into problems on MacOS. To simplify that change, the file search in find_common_syms was also rewritten to use the Perl-standard File::Find package instead of calling the find executable. Theoretically, this should be mildly faster, but is also significantly easier to modify. Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2f01acf
to
a29077f
Compare
Rebase on top of master, fixed a VPATH design issue, fixed an issue with "make check" that was causing "make distcheck" to fail, and removed the libevent rename removal patch. @jsquyres pointed out that my cleanup didn't work right in the usnic code, so I want to treat that with more care. |
FWIW: I just sync'd my local clone of your branch with PMIx and PRRTE and things built/work fine. I'm going to run a local MTT on it, though I'm limited on resources. |
The IBM CI (PGI) build failed! Please review the log, linked below. Gist: https://gist.github.com/2e0f4d55a804580acc64399374784f2b |
Looks like you are still hitting the platform file propagation problem - PMIx doesn't have the "mellanox" platform file |
@bwbarrett You know, there is zero value in the |
FWIW: libevent spews hundreds of warning messages with these latest changes, all due to poor matching of data types with format options in print statements. 😦 |
Local run of the IBM test suite looks good (except for one-sided, which totally failed on my Mac). Let me know if you agree about removing the platform options, and if you want me to push the PMIx/PRRTE updates to your branch. |
@rhc54 why do you say that there's no value in the platform file for PMIx/PRRTE? |
Perhaps I should say "not that much value". It is true that we are adding more things that have dependencies (fabric, storage), so maybe it will change enough over time. If you can support it, then that would be best - if not, we can find a way to live with it. If nothing else, we could change the name of the option so it doesn't pickup the top-level one! |
a29077f
to
d51d771
Compare
@rhc54 For now, I just prevent the --with-platform argument from being passed to PMIx and PRRTE. We can revisit that later, if we decide it is important. Also fixed a PRRTE linking issue on recent GNU linkers (like that found on Ubuntu 20.04). Still need to figure out what is going on with the build flags and PGI compilers. |
The IBM CI (PGI) build failed! Please review the log, linked below. Gist: https://gist.github.com/86efd7bff75cc755ce36da2a5592cf2b |
Does this resolve #7560 ? |
78f5890
to
3eff915
Compare
I think I found the PGI issue. PGI fakes being GCC, so the Autoconf test for GCC returns true. Libevent sets a ton of warning flags by default if the compiler is a GCC variant, many of which PGI does not recognize. The simple solution (since we are not developing Libevent itself) was to pass the Libevent configure flag that disables adding all the warning flags. |
I changed the platform options on both PMIx and PRRTE, so no worries there any more. |
Both are in a good place right now if you want to update your submodules |
bot:aws:retest It looks like Jenkins crashed on one of the builders. |
@rhc54 Let's get this commit in with the current submodule pointers, and then we can follow up with another bump. Trying to get this over the line :). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nit and can wait, but I would suggest moving the "base" code of the hwloc and pmix frameworks into the "opal/util" directory like you did for libevent to avoid confusion. Can be done later if you prefer - I just think it will confuse people down the line to see a "defunct" framework for those two projects.
Otherwise, it looked good. Minor changes to the "platform" file names for PRRTE and PMIx.
config/ompi_setup_prrte.m4
Outdated
[AC_HELP_STRING([--with-prrte-platform], | ||
[Platform file to use when building the internal PRRTE runtime support])]) | ||
# TODO: BWB: Check with Ralph if we need seperate files for OMPI and PRTE (which seems annoying) | ||
dnl AC_ARG_WITH([prrte-platform], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this option with only one 'r' - i.e., "prte"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the project PRRTE or PRTE? I'd like to be consistent through the whole system. In terms of the actual argument, Open MPI will pass all arguments not explicitly called out down to PRTE. So we don't actually have to do anything here; customers can do ./configure --with-platform=<foo> --with-prte-platform=<bar>
and the right things will happen.
But I should remove this code, since it isn't needed at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The project is "PRRTE", but the commands and variables are all with a single "r". I don't know how we want to expose the PRRTE and PMIx configure options at the user level for OMPI - I guess that is an open question? I grok that the right thing will happen - just not sure how the user can see what options exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure there's a great answer here, other than eventually getting ./configure --help=recursive
to work properly. I should figure out why that doesn't. Otherwise, we end up exposing way too many options (like why is --with-platform
special compared to all the others?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree - I think we can set it aside for now as it may become too complex and confusing to the user. Perhaps all we need do at some point is add a comment to configure help indicating that these subprojects are included and direct them on how to find configure options specific to each subproject.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the commented out platform argument from the top level. Also figured out how to make ./configure --help=recursive
work correctly, which I think is the answer to your question about how customers find arguments for sub-projects.
PAC_CONFIG_SUBDIR_ARGS([3rd-party/openpmix], [$internal_pmix_args], | ||
[[--with-libevent=internal], [--with-hwloc=internal], | ||
[--with-libevent=external], [--with-hwloc=external], | ||
[--with-pmix=.*], [--with-platform=.*]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PMIx is now looking for --with-pmix-platform
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup; we don't actually have to do anything here. The 3rd argument is an array of options that PAC_CONFIG_SUBDIR_ARGS will strip out from the top-level configure options when passing down to the sub-configure. This will avoid passing the --with-platform
argument to PMIx, as belt and suspenders type thing.
[AC_MSG_CHECKING([if external PMIx version is 3.0.0 or greater]) | ||
AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[#include <pmix_version.h>]], | ||
[[ | ||
#if PMIX_VERSION_MAJOR < 3L |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure OMPI (setting aside PRRTE) can support v3.0.0 - it may well require at least v3.1.5. I wouldn't hold this PR over it, but is something we should check prior to release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call out; I think testing is the right path. 3.0.0 was what the external component had been checking for, which is why I left it.
I tried to pull out the HWLOC and PMIx code into util. It was a big project and I gave up. We can always do it later. I left some notes in the mca// directories as a result. |
With Open MPI 5.0, the decision was made to stop building 3rd-party packages, such as Libevent, HWLOC, PMIx, and PRRTE as MCA components and instead 1) start relying on external libraries whenever possible and 2) Open MPI builds the 3rd party libraries (if needed) as independent libraries, rather than linked into libopen-pal. This patch is the first step in that process, providing foundational changes required for supporting 3rd-party packages, such as changes to autogen.pl, the top-level Makefile.am, and introducing two Autoconf macros to support running sub-configure scripts; one supporting source in tarball form and the other supporting source in a sub-tree. Signed-off-by: Brian Barrett <bbarrett@amazon.com>
With Open MPI 5.0, the decision was made to stop building 3rd-party packages, such as Libevent, HWLOC, PMIx, and PRRTE as MCA components and instead 1) start relying on external libraries whenever possible and 2) Open MPI builds the 3rd party libraries (if needed) as independent libraries, rather than linked into libopen-pal. This patch moves libevent from an MCA framework to a stand-alone library built outside of OPAL. A wrapper in opal/util is provided to minimize the unnecessary changes in the rest of the code. When using the internal Libevent, it will be installed as a stand-alone libevent.a, instead of bundled in OPAL. Any pre-installed version of Libevent at or after 2.0.21 is preferred over the internal version. Signed-off-by: Brian Barrett <bbarrett@amazon.com>
With Open MPI 5.0, the decision was made to stop building 3rd-party packages, such as Libevent, HWLOC, PMIx, and PRRTE as MCA components and instead 1) start relying on external libraries whenever possible and 2) Open MPI builds the 3rd party libraries (if needed) as independent libraries, rather than linked into libopen-pal. This patch moves the hwloc library bundled with Open MPI from a MCA framework to a stand-alone library built outside of OPAL. Due to the amount of code in the MCA base (and its assumptions about being part of an MCA framework), the framework is left with no active components. Any pre-installed version of HWLOC 1.6 or newer is preferred over the internal version. Signed-off-by: Brian Barrett <bbarrett@amazon.com>
With Open MPI 5.0, the decision was made to stop building 3rd-party packages, such as Libevent, HWLOC, PMIx, and PRRTE as MCA components and instead 1) start relying on external libraries whenever possible and 2) Open MPI builds the 3rd party libraries (if needed) as independent libraries, rather than linked into libopen-pal. This patch moves the PMIx library bundled with Open MPI from a MCA framework to a stand-alone library built outside of OPAL. Due to the amount of code in the MCA base (and its assumptions about being part of an MCA framework), the framework is left with no active components. Any pre-installed version of PMIx 3.0.0 or newer is preferred over the internal version. Signed-off-by: Brian Barrett <bbarrett@amazon.com>
With Open MPI 5.0, the decision was made to stop building 3rd-party packages, such as Libevent, HWLOC, PMIx, and PRRTE as MCA components and instead 1) start relying on external libraries whenever possible and 2) Open MPI builds the 3rd party libraries (if needed) as independent libraries, rather than linked into libopen-pal. This patch moves the prrte submodule from the top-level to the 3rd-party directory, to match the behavior of other 3rd-party packages like Libevent and PMIx. Since Open MPI does not support building with an external PRRTE, that functionality is skipped in this patch. Signed-off-by: Brian Barrett <bbarrett@amazon.com>
The refactoring patches move Libevent from a framework integration to a 3rd-party package, but did not change the Libevent version that Open MPI ships. During that swap, we stopped running the Autotools on Libevent and relied on the tools the Libevent authors used when building the 2.0.22 release tarball. The config.guess in this release tarball did not work on the IBM systems. This patch updates the release version of Libevent to 2.1.12-stable, which will suck in a bunch of upstream bug fixes and updates the config.guess so that the 3rd-party refactoring actually compiles on the IBM Power systems. Signed-off-by: Brian Barrett <bbarrett@amazon.com>
3eff915
to
a30eb69
Compare
Removed an un-needed todo that Ralph found and fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks go9od - thx!
With Open MPI 5.0, the decision was made to stop building 3rd-party packages, such as Libevent, HWLOC, PMIx, and PRRTE as MCA components and instead 1) start relying on external libraries whenever possible and 2) Open MPI builds the 3rd party libraries (if needed) as independent libraries, rather than linked into libopen-pal. This patch series implements this behavior.