-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Topic/pmixstat #12
Topic/pmixstat #12
Conversation
Both opal_hwloc_base_get_relative_locality() and _get_locality_string() iterate over hwloc levels to build the proc locality information. Unfortunately, NUMA nodes are not in those normal levels anymore since 2.0. We have to explicitly look a the special NUMA level to get that locality info. I am factorizing the core of the iterations inside dedicated "_by_depth" functions and calling them again for the NUMA level at the end of the loops. Thanks to Hatem Elshazly for reporting the NUMA communicator split failure at https://www.mail-archive.com/users@lists.open-mpi.org/msg33589.html It looks like only the opal_hwloc_base_get_locality_string() part is needed to fix that split, but there's no reason not to fix get_relative_locality() as well. Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Forgot to include a fix for the fortran test used to check if new dtags is supported. Related to open-mpi#7268 This patch is already included on v4.0.x branch. Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Signed-off-by: Artem Ryabov <artemry@mellanox.com>
…x_ci_for_release_branches Enabled Mellanox CI for release branches (changes for master branch).
Fix the C types for the following: * MPI_UNWEIGHTED * MPI_WEIGHTS_EMPTY * MPI_ARGV_NULL * MPI_ARGVS_NULL * MPI_ERRCODES_IGNORE There is lengthy discussion on open-mpi#7210 describing the issue; the gist of it is that the C and Fortran types for several MPI global sentenial values should agree (specifically: their sizes must(**) agree). We erroneously had several of these array-like sentinel values be "array-like" values in C. E.g., MPI_ERRCODES_IGNORE was an (int *) in C while its corresponding Fortran type was "integer, dimension(1)". On a 64 bit platform, this resulted in C expecting the symbol size to be sizeof(int*)==8 while Fortran expected the symbol size to be sizeof(INTEGER, DIMENSION(1))==4. That is incorrect -- the corresponding C type needed to be (int). Then both C and Fortran expect the size of the symbol to be the same. (**) NOTE: This code has been wrong for years. This mismatch of types typically worked because, due to Fortran's call-by-reference semantics, Open MPI was comparing the *addresses* of these instances, not their *types* (or sizes) -- so even if C expected the size of the symbol to be X and Fortran expected the size of the symbol to be Y (where X!=Y), all we really checked at run time was that the addresses of the symbols were the same. But it caused linker warning messages, and even caused errors in some cases. Specifically: due to a GNU ld bug (https://sourceware.org/bugzilla/show_bug.cgi?id=25236), the 5 common symbols are incorrectly versioned VER_NDX_LOCAL because their definitions in Fortran sources have smaller st_size than those in libmpi.so. This makes the Fortran library not linkable with lld in distributions that ship openmpi built with -Wl,--version-script (https://bugs.llvm.org/show_bug.cgi?id=43748): % mpifort -fuse-ld=lld /dev/null ld.lld: error: corrupt input file: version definition index 0 for symbol mpi_fortran_argv_null_ is out of bounds >>> defined in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_usempif08.so ... If we fix the C and Fortran symbols to actually be the same size, the problem goes away and the GNU ld bug does not come into play. This commit also fixes a minor issue that MPI_UNWEIGHTED and MPI_WEIGHTS_EMPTY were not declared as Fortran arrays (not fully fixed by commit 107c007). Fixes open-mpi#7209 Signed-off-by: Fangrui Song <i@maskray.me> Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Fix Fortran st_size fields of mpi_fortran_argv_null_, mpi_fortran_weights_empty_, mpi_fortran_unweighted_, mpi_fortran_errcodes_ignore_, and mpi_fortran_argvs_null_
Signed-off-by: Dmitry Gladkov <dmitrygla@mellanox.com>
PR 7268 follow-up
hwloc/base: fix opal proc locality wrt to NUMA nodes on hwloc 2.0
These -D's are for C compilation, not Fortran compilation. Remove this useless statement. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Automake's Fortran compilation rules inexplicably use CPPFLAGS and AM_CPPFLAGS. Unfortunately, this can cause problems in some cases (e.g., picking up already-installed mpi.mod in a system-default include search path). So in relevant module-using Fortran compilation Makefile.am's, zero out CPPFLAGS and AM_CPPFLAGS. This has a side-effect of requiring that we compile the one .c file in the F08 library in a new, separate subdirectory (with its own Makefile.am that does _not_ have CPPFLAGS/AM_CPPFLAGS zeroed out). Signed-off-by: Jeff Squyres <jsquyres@cisco.com> Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: Ralph Castain <rhc@pmix.org>
Always consider retrieval of HOSTNAME to be optional
SPML/UCX: Fix compilation warnings with GCC
Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
Protect use of _Static_assert().
Will be replaced by PRRTE. Ensure that OMPI and OPAL layers build without reference to ORTE. Setup opal/pmix framework to be static. Remove support for all PMI-1 and PMI-2 libraries. Add support for "external" pmix component as well as internal v4 one. remove orte: misc fixes - UCX fixes - VPATH issue - oshmem fixes - remove useless definition - Add PRRTE submodule - Get autogen.pl to traverse PRRTE submodule - Remove stale orcm reference - Configure embedded PRRTE - Correctly pass the prefix to PRRTE - Correctly set the OMPI_WANT_PRRTE am_conditional - Move prrte configuration to the end of OMPI's configure.ac - Make mpirun a symlink to prun, when available - Fix makedist with --no-orte/--no-prrte option - Add a `--no-prrte` option which is the same as the legacy `--no-orte` option. - Remove embedded PMIx tarball. Replace it with new submodule pointing to OpenPMIx master repo's master branch - Some cleanup in PRRTE integration and add config summary entry - Correctly set the hostname - Fix locality - Fix singleton operations Signed-off-by: Ralph Castain <rhc@pmix.org> Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
Signed-off-by: Ralph Castain <rhc@pmix.org>
The IBM CI (GNU/Scale) build failed! Please review the log, linked below. Gist: https://gist.github.com/f80e5bdffc4738ddcf704c0114e780b6 |
The IBM CI (XL) build failed! Please review the log, linked below. Gist: https://gist.github.com/0c6e44147f05d5343ae6ab34c2609bcb |
bot:ibm:retest |
The IBM CI (GNU/Scale) build failed! Please review the log, linked below. Gist: https://gist.github.com/ibm-ompi/88b480fb98dfd2bdac939f023e1f8fe6 |
bot:ibm:gnu:retest |
The IBM CI (XL) build failed! Please review the log, linked below. Gist: https://gist.github.com/d88abb002df75079a19cf59de7c9241b |
bot:ibm:prrte:retest |
The IBM CI (GNU/Scale) build failed! Please review the log, linked below. Gist: https://gist.github.com/ibm-ompi/88b480fb98dfd2bdac939f023e1f8fe6 |
bot:ibm:prrte:retest |
bot:ibm:xl:retest |
bot:ibm:prrte:retest |
1 similar comment
bot:ibm:prrte:retest |
bot:ibm:retest |
2 similar comments
bot:ibm:retest |
bot:ibm:retest |
The IBM CI (PRRTE) build failed! Please review the log, linked below. Gist: https://gist.github.com/19d84ead9473d35fd7343b6c24ff8bfa |
bot:ibm:retest |
The IBM CI (PRRTE) build failed! Please review the log, linked below. Gist: https://gist.github.com/0bc63ee355523e9b3b0f6dcc39d8dffd |
bot:ibm:retest |
The IBM CI (PRRTE) build failed! Please review the log, linked below. Gist: https://gist.github.com/47664cf5630ac7307abaca1c4e822de3 |
bot:ibm:retest |
1 similar comment
bot:ibm:retest |
bot:ibm:pgi:retest |
CI testing for upstream: open-mpi#7202