Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

netcdf-C 4.6.1 build error on macOS 10.13: ocprint link fails #1095

Closed
3 of 12 tasks
mathomp4 opened this issue Jul 31, 2018 · 30 comments
Closed
3 of 12 tasks

netcdf-C 4.6.1 build error on macOS 10.13: ocprint link fails #1095

mathomp4 opened this issue Jul 31, 2018 · 30 comments

Comments

@mathomp4
Copy link

Environment Information

  • What platform are you using? (please provide specific distribution/version in summary)
    • Linux
    • Windows
    • OSX
    • Other
    • NA
  • 32 and/or 64 bit?
    • 32-bit
    • 64-bit
  • What build system are you using?
    • autotools (configure)
    • cmake
  • Can you provide a sample netCDF file or C code to recreate the issue?
    • Yes (please attach to this issue, thank you!)
    • No
    • Not at this time

Summary of Issue

My build of netcdf-C 4.6.1 on macOS 10.13.5 isn't quite working. I'm building with GCC 8.1.0 (built from source) and Open MPI 3.1.0. I built HDF5 with --enable-parallel so mpicc is my C compiler in this stage.

From my scan of the build log, libnetcdf.a is made, but when ocprint tries to link I get:

libtool: link: mpicc -o ocprint ocprint.o  ../liblib/.libs/libnetcdf.a -L/Users/mathomp4/Baselibs/ESMA-Baselibs-5.1.3-netCDF461/x86_64-apple-darwin17.7.0/gfortran/Darwin/lib /Users/mathomp4/Baselibs/ESMA-Baselibs-5.1.3-netCDF461/x86_64-apple-darwin17.7.0/gfortran/Darwin/lib/libhdf5_hl.a /Users/mathomp4/Baselibs/ESMA-Baselibs-5.1.3-netCDF461/x86_64-apple-darwin17.7.0/gfortran/Darwin/lib/libhdf5.a /Users/mathomp4/Baselibs/ESMA-Baselibs-5.1.3-netCDF461/x86_64-apple-darwin17.7.0/gfortran/Darwin/lib/libmfhdf.a /Users/mathomp4/Baselibs/ESMA-Baselibs-5.1.3-netCDF461/x86_64-apple-darwin17.7.0/gfortran/Darwin/lib/libdf.a /Users/mathomp4/Baselibs/ESMA-Baselibs-5.1.3-netCDF461/x86_64-apple-darwin17.7.0/gfortran/Darwin/lib/libsz.a /Users/mathomp4/Baselibs/ESMA-Baselibs-5.1.3-netCDF461/x86_64-apple-darwin17.7.0/gfortran/Darwin/lib/libjpeg.a /Users/mathomp4/Baselibs/ESMA-Baselibs-5.1.3-netCDF461/x86_64-apple-darwin17.7.0/gfortran/Darwin/lib/libcurl.a -lz -ldl -lm
Undefined symbols for architecture x86_64:
  "_ncrc_globalstate", referenced from:
      _createtempfile in libnetcdf.a(liboc_la-ocinternal.o)
      _ocset_curlproperties in libnetcdf.a(liboc_la-ocinternal.o)
      _NC_rcload in libnetcdf.a(libdispatch_la-drc.o)
      _NC_set_rcfile in libnetcdf.a(libdispatch_la-drc.o)
      _rccompile in libnetcdf.a(libdispatch_la-drc.o)
      _rclocate in libnetcdf.a(libdispatch_la-drc.o)
ld: symbol(s) not found for architecture x86_64
collect2: error: ld returned 1 exit status
make[4]: *** [Makefile:1046: ocprint] Error 1
make[4]: *** Waiting for unfinished jobs....

As a test, I took netcdf-C 4.5.0 (which had worked before on this machine) and tried building it instead (feeding in the exact same libraries that went into the build above) and ocprint, etc all built just fine.

As a second test, I built with --disable-dap instead (see original configure line below) and it will build because ocprint is avoided. But, this netCDF build is for a model and one of the requirements for the netCDF built is that it have OpenDAP capability, thus I can't disable dap for production.

Any ideas on how to fix this with autotools? I thought I'd troll around the CMakeLists.txt files to see if ocprint was being built/linked with extra libraries perhaps with CMake. When I did so I see:

# Apparently fails under cmake
#set(ocprint_FILES ocprint.c )
#ADD_EXECUTABLE(ocprint ${ocprint_FILES})
#TARGET_LINK_LIBRARIES(ocprint oc2 ${ALL_TLL_LIBS})

(which is in the oc2/CMakeLists.txt, not the ncdump/CMakeLists.txt).

Huh. Indeed, the autotools build seems to build things that CMake doesn't (e.g., nc4print which doesn't seem to appear in any CMakeLists.txt file in the tarball...and isn't installed, just built?).

Is there a way to get autotools build to skip ocprint? I don't think I need it (I've never used it and I know of no one else who has), but it would at least let me get a (partially) DAP-enabled netCDF built. (NOTE: I had thought about using a CMake build, but I'd probably need more help with that. I don't really know how to translate all of that configure line to CMake, especially the --includedir line. I didn't see in the INSTALL.md on how to translate that flag.)

Steps to reproduce the behavior

My configure line is:

  $ ./configure --prefix=/Users/mathomp4/Baselibs/ESMA-Baselibs-5.1.3-netCDF461/x86_64-apple-darwin17.7.0/gfortran/Darwin --includedir=/Users/mathomp4/Baselibs/ESMA-Baselibs-5.1.3-netCDF461/x86_64-apple-darwin17.7.0/gfortran/Darwin/include/netcdf --enable-hdf4 --enable-dap --enable-parallel-tests --disable-shared --disable-examples --enable-netcdf-4 CC=mpicc FC=mpifort CXX=mpic++ F77=mpifort

And, as I said before, I can build 4.5.0 just fine using all the same infrastructure, but 4.6.1 won't.

@edhartnett
Copy link
Contributor

Do you have the capability to build from the git repo?

That is, can you clone from git, and then run autoreconf -i to build the configure script and the Makefile.in files?

If so, you can comment out the following lines in ncdump/Makefile.am:

# Conditionally build the ocprint program, but do not install
if ENABLE_DAP
bin_PROGRAMS += ocprint
ocprint_SOURCES = ocprint.c
endif

If you don't know how to do this, let me know and I will do a special dist for you without these lines.

Seems like if a program is not going to be installed, there is no reason for it to be in the build. @DennisHeimbigner can we move ocprint to one of the test directories? Or build it only with a special configure flag?

@mathomp4
Copy link
Author

@edhartnett I learned some autoreconf -i myself earlier. The HPC netCDF doesn't supply a configure in its release tarball, so I had to build it with autoreconf. :)

I can before I run configure do:
sed -i -e '/ocprint program/,+4 s/^/#/' netcdf/ncdump/Makefile.am
or the like in my main make script.

And, technically, ocprint is indeed installed (when built) because I see it in the bin directory when I build my Baselibs. There is just some bug somewhere that I'm triggering when it's built.

The program not installed (in autotools land) is nc4print, at least. But ocprint is definitely not built in CMake as far as I can see from the CMakeLists.txt.

@WardF
Copy link
Member

WardF commented Jul 31, 2018

Good catch, I'm wiring in ocdump to cmake right now. It will be merged in with the latest PR group.

@mathomp4
Copy link
Author

Okay. These two lines before I run configure:

sed -i -e '/ocprint program/,+4 s/^/###/' ncdump/Makefile.am
sed -i -e '/ENABLE_DAP_TRUE/ s/^/###/' ncdump/Makefile.in

stop ocprint from building. Makefile.am alone didn't seem to do it. (And, of course, sed must be GNU sed, so I need to harden my make script to make sure sed == gsed on a Mac.)

Once I do that, netCDF builds to completion.

@DennisHeimbigner
Copy link
Collaborator

There are a couple of files and programs that are in the github repo
purely for purposes of helping me debug stuff. Both ocprint and nc4print
are examples (as well as cf, cf.cmake, and Make0). As a rule
those executables should be built by automake, but not by cmake
because I tend to do my debugging with automake builds.
If ocprint is being installed, then it probably should be marked as noinst_...

@WardF
Copy link
Member

WardF commented Jul 31, 2018

Oh, good to know, I will mark it as such in the next group of PR's.

@edhartnett
Copy link
Contributor

ocprint is having a build problem, not an install problem.

Suggest that you move whatever is not part of the library to a test directory. Then they user may see a test failure, but they will still be able to do make all and get the library built.

Just marking this as noinst_ will not solve the problem, I believe.

@DennisHeimbigner
Copy link
Collaborator

Can someone try the build on a mac using normal gcc instead of mpicc?

@WardF
Copy link
Member

WardF commented Jul 31, 2018

Testing now.

@WardF
Copy link
Member

WardF commented Jul 31, 2018

I'm not having any problems (using the current master) when building it on my end with cmake and gcc/clang/mpich, on OSX.

@DennisHeimbigner
Copy link
Collaborator

So is this a false alarm?

@WardF
Copy link
Member

WardF commented Jul 31, 2018

I’ve fixed the issue I was seeing. Let me test against 4.6.1.

@DennisHeimbigner
Copy link
Collaborator

I am losing the track. I thought you said it worked fine. What issue were you
seeing?

@WardF
Copy link
Member

WardF commented Jul 31, 2018

The change I added so that it builds with cmake works fine. That change is in master. I see the original question was about 4.6.1, so I want to see if I can duplicate the issue in that version.

@mathomp4
Copy link
Author

mathomp4 commented Jul 31, 2018

@DennisHeimbigner I can try tomorrow on my system with gcc and not mpicc. I was just building as I usually do (with Parallel HDF5). Note that it will be with GCC 8.2 and not Clang. I have never gotten Clang and ESMF to work right, so I make it a mission to make sure clang never invades my builds.

But, as I noted, netCDF-C 4.5.0 builds just fine with the same configure line...

ETA: Also note: I've never built netcdf with Clang, but since 4.6.1 doesn't seem to build it, it's sort of a null point of proof.

@mathomp4
Copy link
Author

mathomp4 commented Aug 1, 2018

@DennisHeimbigner I get the same failure with gcc using autotools:

libtool: link: gcc -o ocprint ocprint.o  ../liblib/.libs/libnetcdf.a -L/Users/mathomp4/Baselibs/ESMA-Baselibs-5.1.3-netCDF461/x86_64-apple-darwin17.7.0/gfortran/Darwin/lib /Users/mathomp4/Baselibs/ESMA-Baselibs-5.1.3-netCDF461/x86_64-apple-darwin17.7.0/gfortran/Darwin/lib/libhdf5_hl.a /Users/mathomp4/Baselibs/ESMA-Baselibs-5.1.3-netCDF461/x86_64-apple-darwin17.7.0/gfortran/Darwin/lib/libhdf5.a /Users/mathomp4/Baselibs/ESMA-Baselibs-5.1.3-netCDF461/x86_64-apple-darwin17.7.0/gfortran/Darwin/lib/libmfhdf.a /Users/mathomp4/Baselibs/ESMA-Baselibs-5.1.3-netCDF461/x86_64-apple-darwin17.7.0/gfortran/Darwin/lib/libdf.a /Users/mathomp4/Baselibs/ESMA-Baselibs-5.1.3-netCDF461/x86_64-apple-darwin17.7.0/gfortran/Darwin/lib/libsz.a /Users/mathomp4/Baselibs/ESMA-Baselibs-5.1.3-netCDF461/x86_64-apple-darwin17.7.0/gfortran/Darwin/lib/libjpeg.a /Users/mathomp4/Baselibs/ESMA-Baselibs-5.1.3-netCDF461/x86_64-apple-darwin17.7.0/gfortran/Darwin/lib/libcurl.a -lz -ldl -lm
Undefined symbols for architecture x86_64:
  "_ncrc_globalstate", referenced from:
      _createtempfile in libnetcdf.a(liboc_la-ocinternal.o)
      _ocset_curlproperties in libnetcdf.a(liboc_la-ocinternal.o)
      _NC_rcload in libnetcdf.a(libdispatch_la-drc.o)
      _NC_set_rcfile in libnetcdf.a(libdispatch_la-drc.o)
      _rccompile in libnetcdf.a(libdispatch_la-drc.o)
      _rclocate in libnetcdf.a(libdispatch_la-drc.o)
ld: symbol(s) not found for architecture x86_64
collect2: error: ld returned 1 exit status
make[4]: *** [Makefile:1046: ocprint] Error 1

@DennisHeimbigner
Copy link
Collaborator

ok, two possibilities come to mind.

  1. look at the build output. My bet is that something before building ocprint failed,
    probably in libdispatch.

  2. The global variable ncrc_globalstate is a struct, not a pointer to a struct.
    it is barely possible that the mac loader cannot handle linking to a global
    struct. If so, then I will need to modify ncrc_globalstate to be a pointer
    to the struct. Not a big change.

@mathomp4
Copy link
Author

mathomp4 commented Aug 1, 2018

@DennisHeimbigner

  1. See the attached log file. I don't see any issues other than ocprint. Well, there are the usual 'ranlib' warnings that macOS seems to just love spitting out, but nothing like an Error 2 or ***. makeinstall.justnetcdf.netcdf461.MPIUNI.log

  2. I guess one question is: did ncrc_globalstate change between 4.5.0 and 4.6.1? Because on the same machine with the same OS and Xcode and same compiler, 4.5.0 builds just fine.

@DennisHeimbigner
Copy link
Collaborator

I think that the thing that changed was that all of the .rc processing code was
unified and moved to libdispatch. If Mac has the equivalent of the linux nm
program, you might do
nm ./liblib/.libs/libnetcdf.a | fgrep ncrc_globalstate
[the path to libnetcdf.a will vary]
and see if ncrc_globalstate is defined (probably with a T flag).

@DennisHeimbigner
Copy link
Collaborator

Check that. It should appear with the C flag, not the T flag

@DennisHeimbigner
Copy link
Collaborator

I just noticed that you appear to be going directly to
make install
instead of starting with
make all install
You might try doing
make all
instead of
make install
We have had problems before with this kind of build errors.
I thought we have found them all, but perhaps not.

@WardF
Copy link
Member

WardF commented Aug 1, 2018

I will take a closer look at this. I suspect the error, whatever it is, is going to be more subtle than an outright build system configuration error, given that we build on OSX regularly without issue.

@WardF
Copy link
Member

WardF commented Aug 1, 2018

@mathomp4 do you have the OSX developer tools installed? I presume you installed gcc, gfortan and such through a third party package manager, or perhaps by hand.

I just read back and see that it was built by hand. So, I wonder, do you have the OSX dev tools installed? I’m just looking for differences that would explain the issue you’re seeing vs. our inability to recreate it.

@mathomp4
Copy link
Author

mathomp4 commented Aug 2, 2018

@WardF I'm not sure, though it is possible. I had to run xcode --select which I believe loads the XCode command line programs (clang and all?).

But, yes, while I use things like brew for many programs (for example, a sed that isn't brain-dead), I don't rely on it for things like compilers. I want to build and manage my own for now.

@mathomp4
Copy link
Author

mathomp4 commented Aug 2, 2018

Also, for @DennisHeimbigner, I tried a make all followed by make install but it didn't make a difference.
makeallinstall.justnetcdf.netcdf461.MPIUNI.log

@DennisHeimbigner
Copy link
Collaborator

I guess I am stumped since we cannot seem to reproduce the error.

@WardF
Copy link
Member

WardF commented Nov 28, 2018

@mathomp4 are you still seeing this issue?

@mathomp4
Copy link
Author

I will let you know soon. I've downloaded netCDF 4.6.2 and am building.

@mathomp4
Copy link
Author

@WardF It looks good! I have ocprint again...not that I know what it does. :)

4.6.2's libdispatch does seem to have triggered a build issue in NCO amazingly similar to one I found a few years ago. I've pinged Charlie Zender on his list about that one.

I'll try and test out the optimizations to the opens and wait for the new netcdf-fortran so I can start bugging all of you with "How can I get zfp working with my code?"

@WardF
Copy link
Member

WardF commented Dec 5, 2018

Sounds good thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants