Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recent hwloc base change breaks external hwloc <= v2.0.x #7362

Closed
jsquyres opened this issue Feb 4, 2020 · 8 comments · Fixed by #7366
Closed

Recent hwloc base change breaks external hwloc <= v2.0.x #7362

jsquyres opened this issue Feb 4, 2020 · 8 comments · Fixed by #7366
Assignees

Comments

@jsquyres
Copy link
Member

jsquyres commented Feb 4, 2020

PR #7201 causes compile errors on master when compiling with older versions of hwloc (older than v2.0.x).

Open MPI master still supports hwloc >= v1.5.x. Are we changing this to be >= v2.0.x?

Specifically, when I try compiling Open MPI master HEAD with an external installation of 1.5.1, I get the following compile error:

make[1]: Entering directory `/home/jsquyres/git/ompi/opal/mca/hwloc'
  CC       base/hwloc_base_util.lo
base/hwloc_base_util.c: In function ‘opal_hwloc_base_find_coprocessors’:
base/hwloc_base_util.c:1393:12: warning: unused variable ‘cps’ [-Wunused-variable]
     char **cps = NULL;
            ^~~
base/hwloc_base_util.c:1392:14: warning: unused variable ‘i’ [-Wunused-variable]
     unsigned i;
              ^
base/hwloc_base_util.c:1391:17: warning: unused variable ‘osdev’ [-Wunused-variable]
     hwloc_obj_t osdev;
                 ^~~~~
base/hwloc_base_util.c: In function ‘opal_hwloc_base_get_locality_string’:
base/hwloc_base_util.c:2207:30: error: ‘obj’ undeclared (first use in this function)
                     if (3 == obj->attr->cache.depth) {
                              ^~~
base/hwloc_base_util.c:2207:30: note: each undeclared identifier is reported only once for each function it appears in

It looks like ea80a20 removed the declaration and initialization of obj for HWLOC_API_VERSION < 0x20000.

What do we want to do here?

FYI @bgoglin @hppritcha

@jsquyres
Copy link
Member Author

jsquyres commented Feb 4, 2020

@gpaulsen @hppritcha There was discussion on #7201 as to whether it should be back-ported to v4.0.x. It doesn't look like it was, but just in case you guys have that one on deck for v4.0.x: be warned that breaking compatibility to hwloc <= v2.0 is a consequence.

@hppritcha
Copy link
Member

@bgoglin i will take a look at patching this up.

@rhc54
Copy link
Contributor

rhc54 commented Feb 4, 2020

Wow - are you seriously holding us to work with something as ancient as hwloc v1.5?? I thought we only went back to 1.11, or maybe 1.10 (doubtful)

@bgoglin
Copy link
Contributor

bgoglin commented Feb 4, 2020

I thought you would have a CI testing over hwloc < 2.0.
Go ahead and bring back the obj definition, and add some #if HWLOC_API_VERSION to shutup the warning. I can look at it later today.
@rhc54 Don't worry, hwloc 1.11 likely breaks the same :)

@jsquyres
Copy link
Member Author

jsquyres commented Feb 4, 2020

Wow - are you seriously holding us to work with something as ancient as hwloc v1.5?? I thought we only went back to 1.11, or maybe 1.10 (doubtful)

FWIW: this is what's currently in the code base:

AC_MSG_CHECKING([if external hwloc version is 1.5 or greater])
AC_COMPILE_IFELSE(
[AC_LANG_PROGRAM([[#include <hwloc.h>]],
[[
#if HWLOC_API_VERSION < 0x00010500
#error "hwloc API version is less than 0x00010500"
#endif
]])],
[AC_MSG_RESULT([yes])],
[AC_MSG_RESULT([no])
opal_hwloc_external_support=no])])

So I have an MTT testing with 1.5.1.

I know that this is a side issue, but: I don't know if I have an opinion on what the oldest version is that we need to support. What do popular distros ship with? If we're considering bumping up the minimum hwloc supported version, we should probably look at what RHEL 7/CentOS 7 ships...?

@jsquyres
Copy link
Member Author

jsquyres commented Feb 4, 2020

I thought you would have a CI testing over hwloc < 2.0.

Only in the nightly MTT builds -- not in CI, sorry. ☹️

Go ahead and bring back the obj definition, and add some #if HWLOC_API_VERSION to shutup the warning. I can look at it later today.

I probably will not have the cycles to do this any time soon. ☹️ Beers for @bgoglin in Portland!

@rhc54
Copy link
Contributor

rhc54 commented Feb 4, 2020

Centos 7 is at hwloc v1.11 - don't know about other distros

@jsquyres
Copy link
Member Author

jsquyres commented Feb 4, 2020

Centos 7 is at hwloc v1.11 - don't know about other distros

Good data point. I'll add this to the f2f agenda (since Brice will be there, too). That's orthogonal to this issue (that the code breaks for any version <2.0.x), but it's good to have that conversation.

hppritcha added a commit to hppritcha/ompi that referenced this issue Feb 4, 2020
PR open-mpi#7201 broke use of hwloc 1.x series.  this patches gets hwloc 1.x working again with OMPI

fixes open-mpi#7362

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
bgoglin added a commit to bgoglin/ompi that referenced this issue Feb 4, 2020
Build was broken by mistake in commit d40662edc41a5a4d09ae690b640cfdeeb24e15a1

Fixes open-mpi#7362

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
bgoglin added a commit to bgoglin/ompi that referenced this issue Feb 4, 2020
Refs open-mpi#7362

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
gpaulsen pushed a commit to hppritcha/ompi that referenced this issue Feb 10, 2020
Refs open-mpi#7362

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
(cherry picked from commit 329d445)
gpaulsen pushed a commit to hppritcha/ompi that referenced this issue Feb 12, 2020
Build was broken by mistake in commit d40662edc41a5a4d09ae690b640cfdeeb24e15a1

Fixes open-mpi#7362

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
(cherry picked from commit 907ad85)
cniethammer pushed a commit to cniethammer/ompi that referenced this issue May 10, 2020
Refs open-mpi#7362

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
(cherry picked from commit 329d445)
cniethammer pushed a commit to cniethammer/ompi that referenced this issue May 10, 2020
Build was broken by mistake in commit d40662edc41a5a4d09ae690b640cfdeeb24e15a1

Fixes open-mpi#7362

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
(cherry picked from commit 907ad85)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants