Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpath errors #17602

Closed
vtjnash opened this issue Jul 25, 2016 · 44 comments · Fixed by #17634
Closed

rpath errors #17602

vtjnash opened this issue Jul 25, 2016 · 44 comments · Fixed by #17634
Labels
domain:building Build system, or building Julia or its dependencies kind:regression Regression in behavior compared to a previous version status:help wanted Indicates that a maintainer wants help on an issue or pull request status:priority This should be addressed urgently

Comments

@vtjnash
Copy link
Sponsor Member

vtjnash commented Jul 25, 2016

It seems like the recent rpath change may need to be reverted as it is causing frequent travis failures (usually with openspecfun, but also fftw):
https://travis-ci.org/JuliaLang/julia/jobs/146885109
https://travis-ci.org/JuliaLang/julia/jobs/146808597
https://travis-ci.org/JuliaLang/julia/jobs/146806235
https://travis-ci.org/JuliaLang/julia/jobs/146773370
https://travis-ci.org/JuliaLang/julia/jobs/146412731
https://travis-ci.org/JuliaLang/julia/jobs/145955397

@vtjnash vtjnash added this to the 0.5.x milestone Jul 25, 2016
@vtjnash vtjnash added the status:priority This should be addressed urgently label Jul 25, 2016
@Keno
Copy link
Member

Keno commented Jul 25, 2016

Sounds like those libraries need to have their rpaths fixed.

@yuyichao
Copy link
Contributor

Might be related. optirun julia also cause libraries in /usr/lib/julia fails to load.

@ViralBShah
Copy link
Member

ViralBShah commented Jul 25, 2016

Do we need these rpath changes for 0.5? I suspect I read it is necessary but can't recollect. If not, perhaps best to revert for now and fix after branching.

@Keno
Copy link
Member

Keno commented Jul 25, 2016

We can add back the rpath to executable, but we shouldn't revert the entirety of the rpath changes.

@StefanKarpinski
Copy link
Sponsor Member

Libraries with failures above: libopenspecfun, libfftw3f, libopenlibm. So we should be in a position to fix these libraries' rpaths without having to argue to much with upstream (since we are upstream).

@ViralBShah
Copy link
Member

These rpath fixes have to be in the julia build - not in the upstream (even though it is us).

@ViralBShah
Copy link
Member

ViralBShah commented Jul 25, 2016

If someone can tell me how to reproduce the issues, I might be able to help fix.

@Keno
Copy link
Member

Keno commented Jul 25, 2016

Ok, fftw3_threads seems to be accidentally setting RUNPATH instead of RPATH, which doesn't make much sense AFAIK, since it's not an executable. May need a patch to FFTW.

openspecfun needs an appropriate addition to it's link line, which is the following patch and the referenced PR to openspecfun.

diff --git a/Make.inc b/Make.inc
index 2264fea..91a6e31 100644
--- a/Make.inc
+++ b/Make.inc
@@ -845,10 +845,12 @@ ifeq ($(OS), WINNT)
 else ifeq ($(OS), Darwin)
   RPATH := -Wl,-rpath,'@executable_path/$(build_libdir_rel)'
   RPATH_ORIGIN := -Wl,-rpath,'@loader_path/'
+  RPATH_ESCAPED_ORIGIN := $(RPATH_ORIGIN)
   RPATH_LIB := -Wl,-rpath,'@loader_path/' -Wl,-rpath,'@loader_path/julia/'
 else
   RPATH := -Wl,-rpath,'$$ORIGIN/$(build_libdir_rel)' -Wl,-rpath-link,$(build_shlibdir) -Wl,-z,origin
   RPATH_ORIGIN := -Wl,-rpath,'$$ORIGIN' -Wl,-z,origin
+  RPATH_ESCAPED_ORIGIN := -Wl,-rpath,'\$$\$$ORIGIN' -Wl,-z,origin
   RPATH_LIB := -Wl,-rpath,'$$ORIGIN' -Wl,-rpath,'$$ORIGIN/julia' -Wl,-z,origin
 endif

diff --git a/deps/openspecfun.mk b/deps/openspecfun.mk
index 0df44fd..781e47e 100644
--- a/deps/openspecfun.mk
+++ b/deps/openspecfun.mk
@@ -11,7 +11,7 @@ endif

 OPENSPECFUN_OBJ_TARGET := $(build_shlibdir)/libopenspecfun.$(SHLIB_EXT)
 OPENSPECFUN_OBJ_SOURCE := $(BUILDDIR)/$(OPENSPECFUN_SRC_DIR)/libopenspecfun.$(SHLIB_EXT)
-OPENSPECFUN_FLAGS := ARCH="$(ARCH)" CC="$(CC)" FC="$(FC)" AR="$(AR)" OS="$(OS)" USECLANG=$(USECLANG) USEGCC=$(USEGCC) FFLAGS="$(JFFLAGS)" CFLAGS="$(CFLAGS) $(OPENSPECFUN_CFLAGS)"
+OPENSPECFUN_FLAGS := ARCH="$(ARCH)" CC="$(CC)" FC="$(FC)" AR="$(AR)" OS="$(OS)" USECLANG=$(USECLANG) USEGCC=$(USEGCC) FFLAGS="$(JFFLAGS)" CFLAGS="$(CFLAGS) $(OPENSPECFUN_CFLAGS)" LDFLAGS="$(RPATH_ESCAPE

 ifeq ($(USE_SYSTEM_LIBM),0)
        OPENSPECFUN_FLAGS += USE_OPENLIBM=1

@Keno
Copy link
Member

Keno commented Jul 25, 2016

I'll put up a quick fix to add the rpath to both libraries after the fact.

@Keno
Copy link
Member

Keno commented Jul 25, 2016

Hmm, after doing some more research, the RUNPATH setting in fftw should actually be fine (contrary to what the dlopen man page says). I was also unable to reproduce the error seen on CI wrt fftw.

@tkelman
Copy link
Contributor

tkelman commented Jul 26, 2016

Do you have a system copy of fftw installed? I think it's likely that Travis VM's do, as a dependency of something else that they have installed by default.

@ViralBShah
Copy link
Member

We can always uninstall the system one on travis before doing julia stuff.

@tkelman
Copy link
Contributor

tkelman commented Jul 26, 2016

I'm actually not sure how to get Travis to uninstall things in the docker container, since we can't do sudo and all apt-get operations have to go through the apt: packages whitelist thing. They may allow caching now in the sudo-true workers so we could look into switching - but not right now.

@tkelman tkelman assigned tkelman and unassigned Keno Jul 26, 2016
@tkelman
Copy link
Contributor

tkelman commented Jul 26, 2016

PR incoming (if just openspecfun isn't enough to get this working better, we can reopen)

tkelman pushed a commit that referenced this issue Jul 26, 2016
from #17602 (comment) but with RPATH_ESCAPED_ORIGIN not cut off
@Keno
Copy link
Member

Keno commented Jul 26, 2016

@tkelman Yes, you're correct. I had a global fftw installed. Still, I can't reproduce the fftw failure. I guess we'll see if it happens again on CI, but it definitely doesn't seem as straightforward as "missing rpath" since it does have RUNPATH set.

@tkelman
Copy link
Contributor

tkelman commented Jul 26, 2016

maybe runpath is buggy back with ubuntu 12.04 toolchains? we're using the ubuntu-toolchain-r-test ppa's gcc-5 package, but dunno if that rebuilds binutils or other core things.

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Jul 26, 2016

It looks like fftw is missing the "I have the string $ORIGIN in my -rpath" flag (the -Wl,-z,origin option to ld)?

~/julia/usr/lib$ readelf -d libfftw3_threads.so  | grep ORIGIN
 0x0000001d (RUNPATH)                    Library runpath: [$ORIGIN]

@tkelman tkelman removed their assignment Jul 26, 2016
@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Jul 26, 2016

also, not sure that libmpfr is built correctly, since it's missing rpath link to ensure we load the right libgmp:

jameson@jamesonnash:~/julia/usr/lib$ readelf -d libmpfr.so

Dynamic section at offset 0x5ae60 contains 27 entries:
  Tag        Type                         Name/Value
 0x00000001 (NEEDED)                     Shared library: [libgmp.so.10]
 0x00000001 (NEEDED)                     Shared library: [libc.so.6]
 0x00000001 (NEEDED)                     Shared library: [ld-linux.so.2]
 0x0000000e (SONAME)                     Library soname: [libmpfr.so.4]

@tkelman tkelman added the domain:building Build system, or building Julia or its dependencies label Jul 26, 2016
@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Jul 26, 2016

Apparently the documentation for RPATH RUNPATH is wrong / incomplete? http://blog.qt.io/blog/2011/10/28/rpath-and-runpath/

mfasi pushed a commit to mfasi/julia that referenced this issue Sep 5, 2016
should hopefully fix JuliaLang#17602 rpath issues
mfasi pushed a commit to mfasi/julia that referenced this issue Sep 5, 2016
from JuliaLang#17602 (comment) but with RPATH_ESCAPED_ORIGIN not cut off
@vtjnash vtjnash modified the milestones: 0.5.0, 0.5.x Sep 15, 2016
@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Sep 15, 2016

Adding this to the build milestone as it seems this means our binaries are unreliable? Libraries that seem to be missing (RPATH):

  • libmpfr.so
  • libcrypto.so.1.0.0
  • libssl.so.1.0.0
  • libfftw3f.so / libfftw3.so
  • libdSFMT.so
  • libopenblas.so
  • libccalltest.so

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Sep 15, 2016

x-ref #18106

@tkelman
Copy link
Contributor

tkelman commented Sep 15, 2016

propose a fix now or it's not holding up 0.5.0.

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Sep 15, 2016

IIUC from the travis failures, this would manifest as opening the wrong ssl library (causing mysterious failures in Pkg) and gmp (from mpfr, for BigFloat support). The fix is to run patchelf on the libraries or pass -rpath as a flag as Keno did for some of the other dependencies.

@tkelman
Copy link
Contributor

tkelman commented Sep 15, 2016

As a PR if you know what's wrong, unless the issue is reliably reproducible.

@ViralBShah ViralBShah modified the milestones: 0.5.x, 0.5.0 Sep 19, 2016
@StefanKarpinski StefanKarpinski added status:help wanted Indicates that a maintainer wants help on an issue or pull request and removed status:help wanted Indicates that a maintainer wants help on an issue or pull request labels Oct 27, 2016
@andreasnoack
Copy link
Member

When building 0.5 on a cluster, I get that

libgit2/libgit2.jl
LoadError("sysimg.jl",326,LoadError("libgit2/libgit2.jl",13,LoadError("libgit2/consts.jl",282,ErrorException("error compiling version: could not load library \"libgit2\"\nlibssl.so.1.0.0: cannot open shared object file: No such file or directory"))))
*** This error is usually fixed by running `make clean`. If the error persists, try `make cleanall`. ***

and

[noack@node008 julia]$ ldd ldd /scratch/users/noack/julia/usr/lib/libgit2.so
    linux-vdso.so.1 =>  (0x00002aaaaaacb000)
    libcurl.so.4 => /scratch/users/noack/julia/usr/lib/libcurl.so.4 (0x00002aaaaafcb000)
    libz.so.1 => /lib64/libz.so.1 (0x00002aaaab244000)
    libssl.so.1.0.0 => not found
    libcrypto.so.1.0.0 => not found
    libssh2.so.1 => /scratch/users/noack/julia/usr/lib/libssh2.so.1 (0x00002aaaab45b000)
    librt.so.1 => /lib64/librt.so.1 (0x00002aaaab68d000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00002aaaab895000)
    libc.so.6 => /lib64/libc.so.6 (0x00002aaaabab3000)
    libmbedtls.so.10 => /scratch/users/noack/julia/usr/lib/libmbedtls.so.10 (0x00002aaaabe47000)
    libmbedx509.so.0 => /scratch/users/noack/julia/usr/lib/libmbedx509.so.0 (0x00002aaaac06d000)
    libmbedcrypto.so.0 => /scratch/users/noack/julia/usr/lib/libmbedcrypto.so.0 (0x00002aaaac281000)
    /lib64/ld-linux-x86-64.so.2 (0x00002aaaaaaab000)

Is that related to this issue?

@JeffBezanson
Copy link
Sponsor Member

It seems this has been worked around enough that it's no longer high priority?

@JeffBezanson JeffBezanson added status:triage This should be discussed on a triage call and removed status:priority This should be addressed urgently labels Sep 24, 2017
@ViralBShah ViralBShah removed this from the 0.5.x milestone Sep 25, 2017
@ViralBShah
Copy link
Member

Should we have a final pass in the binary-dist target that rewrites rpath (or does whatever the right thing is) for all libraries we build and ship?

@vtjnash vtjnash added the status:priority This should be addressed urgently label Sep 25, 2017
@vtjnash vtjnash added this to the 1.0 milestone Sep 25, 2017
@StefanKarpinski
Copy link
Sponsor Member

Is this really a 1.0 blocker? How would fixing this break user code?

@JeffBezanson JeffBezanson modified the milestones: 1.0, 1.x Sep 28, 2017
@JeffBezanson JeffBezanson removed the status:triage This should be discussed on a triage call label Sep 28, 2017
@JeffBezanson
Copy link
Sponsor Member

Would be great to fix in 1.0, but not a blocker.

@ViralBShah
Copy link
Member

I believe we reliably set RPATH on all the libraries now. Please reopen if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:building Build system, or building Julia or its dependencies kind:regression Regression in behavior compared to a previous version status:help wanted Indicates that a maintainer wants help on an issue or pull request status:priority This should be addressed urgently
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants