Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prrte: advance sha to 30cadc6746 #12901

Merged
merged 2 commits into from
Nov 2, 2024

Conversation

hppritcha
Copy link
Member

@hppritcha hppritcha commented Oct 31, 2024

also advance pmix sha to 4aea550f6f55 to pick up PR openpmix/openpmix#3414

@hppritcha
Copy link
Member Author

@dalcinl here you go!

@hppritcha
Copy link
Member Author

hmm, something is borked about configuring prrte for some of the jenkins tests

configure:5174: *** Configuring PRRTE
configure:63521: checking if PMIx version is 4.0.0 or greater
configure:63538: gcc -c -O3 -DNDEBUG  -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes -Wcomment -Wshadow -Werror-implicit-function-declaration -fno-strict-aliasing -pedantic -Wall -Wformat-truncation=0 -finline-functions -mcx16 -I/home/ec2-user/workspace/pen-mpi.pull_request-v2_PR-12901/3rd-party/openpmix/include -I/home/ec2-user/workspace/pen-mpi.pull_request-v2_PR-12901/3rd-party/openpmix/include -I/home/ec2-user/workspace/pen-mpi.pull_request-v2_PR-12901/3rd-party/openpmix/ -I/home/ec2-user/workspace/pen-mpi.pull_request-v2_PR-12901/3rd-party/openpmix/ conftest.c >&5
conftest.c:526:1: warning: function declaration isn't a prototype [-Wstrict-prototypes]
 main ()
 ^~~~
configure:63538: $? = 0
configure:63539: result: yes
configure:63614: ===== configuring 3rd-party/prrte =====
configure:63804: running /bin/sh ./configure --disable-option-checking '--prefix=/home/ec2-user/workspace/pen-mpi.pull_request-v2_PR-12901/install' --enable-prte-ft --with-proxy-version-string=5.1.0a1 --with-proxy-package-name="Open MPI" --with-proxy-bugreport="https://www.open-mpi.org/community/help/" --disable-devel-check --enable-prte-prefix-by-default --disable-pmix-lib-checks --with-pmix-extra-libs="/home/ec2-user/workspace/pen-mpi.pull_request-v2_PR-12901/3rd-party/openpmix/src/libpmix.la" 'CPPFLAGS= -I/home/ec2-user/workspace/pen-mpi.pull_request-v2_PR-12901/3rd-party/openpmix/include -I/home/ec2-user/workspace/pen-mpi.pull_request-v2_PR-12901/3rd-party/openpmix/include -I/home/ec2-user/workspace/pen-mpi.pull_request-v2_PR-12901/3rd-party/openpmix/ -I/home/ec2-user/workspace/pen-mpi.pull_request-v2_PR-12901/3rd-party/openpmix/' --cache-file=/dev/null --srcdir=.
configure:63824: ===== done with 3rd-party/prrte configure =====
configure:63847: error: PRRTE configuration failed.  Cannot continue.

@rhc54
Copy link
Contributor

rhc54 commented Oct 31, 2024

FWIW: on my PR, it kept complaining about not finding a valid PMIx build. Seemed like some issue with bringing down the PMIx submodule.

@hppritcha hppritcha force-pushed the advance_prrte_sha_to_30cadc6746 branch from fc691a3 to 9c4bcbb Compare October 31, 2024 21:00
@hppritcha
Copy link
Member Author

need to figure out what got borked in our prrte fork (we are careful about taking upstream commits but maybe not encough?) before advancing the sha @dalcinl

@rhc54
Copy link
Contributor

rhc54 commented Nov 1, 2024

@hppritcha I don't believe that is the problem, though I could be wrong. When I tried to check OMPI main against head of upstream master branches, the problem I hit (which looked like the one you have here) was that Amazon kept failing to build the PR because PRRTE couldn't find a valid PMIx installation. Never was able to trace down a reason - looked/felt like Amazon simply couldn't download the PMIx submodule, but I'm not clear as to why that wouldn't have aborted the CI right then. Note that all the other CIs have no problem building it, so it is something unique about the Amazon Jenkins one.

Not sure of the reason - and I'm tied up for the next week. Just noting that it may not have anything to do with the PRRTE code.

@hppritcha
Copy link
Member Author

okay now move to a suspicious (in terms on jenkins ci) sha

@hppritcha
Copy link
Member Author

okay the problem is the hwloc jenkins CI is using is too old. configure message isn't very clear though. Looks like updating openpmix submodule may help with that.

@rhc54
Copy link
Contributor

rhc54 commented Nov 1, 2024

okay the problem is the hwloc jenkins CI is using is too old. configure message isn't very clear though. Looks like updating openpmix submodule may help with that.

Per discussion with OMPI rms, we raised the minimum hwloc version to 2.1

@hppritcha
Copy link
Member Author

Our configury isn't very small about failing if PMIx fails to configure, it just trundles on:

configure: WARNING: PMIx requires HWLOC v2.1.0 or above.
configure: error: Please select a supported version and configure again
configure: ===== done with 3rd-party/openpmix configure =====
checking for pmix pkg-config name... pmix
checking if pmix pkg-config module exists... no
checking for pmix wrapper compiler... pmixcc
checking if pmix wrapper compiler works... no
configure: Searching for pmix in default search paths
checking for pmix cppflags... 
checking for pmix ldflags... 
checking for pmix libs... -lpmix
checking for pmix static libs... -lpmix
checking pmix.h usability... no
checking pmix.h presence... no
checking for pmix.h... no
configure: error: Could not find viable pmix build.
+ echo './configure --prefix="/home/ec2-user/workspace/pen-mpi.pull_request-v2_PR-12901/install"  --disable-silent-rules failed, ABORTING !'
./configure --prefix="/home/ec2-user/workspace/pen-mpi.pull_request-v2_PR-12901/install"  --disable-silent-rules failed, ABORTING !
+ test -f config.log
+ echo 'config.log content :'
config.log content :

@rhc54
Copy link
Contributor

rhc54 commented Nov 1, 2024

Yeah that really confused me - had me chasing my tail 😗

@hppritcha
Copy link
Member Author

I notice that the way the CI scripts work, if there's a configure failure at some point rather than just stopping the entire config.log is echo'd. This can make finding the actual configure failure a bit tricky to find in some cases.

@hppritcha hppritcha force-pushed the advance_prrte_sha_to_30cadc6746 branch from 75469b6 to 0f3d21b Compare November 1, 2024 20:59
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
openpmix: move the sha back to 08e41ed

to avoid a bunch of group refactor stuff for now

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
@hppritcha hppritcha force-pushed the advance_prrte_sha_to_30cadc6746 branch from 0f3d21b to 22bdcd6 Compare November 2, 2024 00:34
@rhc54
Copy link
Contributor

rhc54 commented Nov 2, 2024

@hppritcha I think what's confusing here is that OMPI's configure somehow continues on after the configure in PMIx generates an error due to seeing an HWLOC version that is below the minimum required. I'm not sure how/why the AC_MSG_ERROR is failing to stop the entire process, yet somehow we continue and go on to the PRRTE configure code.

Looking at the autoconf documentation for that macro, I do see this caution:

The error-description should start with a lower-case letter, and “cannot” is preferred to “can't”. 

which we violate on nearly all uses of that macro. It's the only AC_MSG_ macro with that caution - no idea why. Quick test shows that PMIx configure does correctly exit with a non-zero status when HWLOC is too old, so I'm not sure I understand the problem here. Might be worth someone exploring?

@hppritcha
Copy link
Member Author

its a problem with the way the jenkins CI build script handles errors. Like I said above it just starts echoing all the logs rather than just exiting itself.

If I run by hand the behavior is as one would expect. configure dies with appropriate error message.

@hppritcha hppritcha merged commit 25feb3b into open-mpi:main Nov 2, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants