Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Julia 1.8.0 hangs on startup when LD_LIBRARY_PATH is set #46409

Closed
tsela opened this issue Aug 19, 2022 · 21 comments
Closed

Julia 1.8.0 hangs on startup when LD_LIBRARY_PATH is set #46409

tsela opened this issue Aug 19, 2022 · 21 comments
Labels
bug Indicates an unexpected problem or unintended behavior building Build system, or building Julia or its dependencies regression Regression in behavior compared to a previous version
Milestone

Comments

@tsela
Copy link

tsela commented Aug 19, 2022

This issue is somewhat complex and slightly above my understanding level to explain correctly, so instead I am copy-pasting the Slack thread where I described the problem and where the actual issue was diagnosed and a workaround was eventually discovered (so that the whole thing doesn't disappear in the Slack hole). I am putting this in a collapsible item in order not to make this issue too long, but I advise you to read the whole thread here or on Slack to get a full idea of the issue.

Slack thread

Christophe Grandsire-Koevoets
Yesterday at 14:30
Hi all, just updated to 1.8.0 on my Ubuntu 18.04 VM, but when I try to run it I get the following:

$ julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org/
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.0 (2022-08-17)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

`ccall` requires the compiler

SYSTEM: caught exception of type

And then Julia simply hangs. 1.7.3 runs fine. Any idea what's going on?
77 replies

Valentin (NOT vchuravy)
🧽 20 hours ago
what do you have in your startup.jl?

Harmen Stoppels
20 hours ago
Does it go away when you temporarily remove/rename julia's libstdc++.so? That's how I've seen the issue during a build of julia (edited)

Christophe Grandsire-Koevoets
20 hours ago
I only have code to start OhMyREPL and Revise if they're found in the environment. But it doesn't matter. If I delete startup.jl, I get:

$ julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org/
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.0 (2022-08-17)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |


SYSTEM: caught exception of type 

And Julia hangs again.

Christophe Grandsire-Koevoets
20 hours ago
@harmen_stoppels
Let me see if I can located that file...

Valentin (NOT vchuravy)
🧽 20 hours ago
how did you install?

Christophe Grandsire-Koevoets
20 hours ago
I found libstdc++.so (and libstd+++.so.6 and libstd++.so.6.0.29) but no combination of removing those files or keeping them makes Julia start

Christophe Grandsire-Koevoets
20 hours ago
I installed via asdf, but all it does is pull the official binaries.

Christophe Grandsire-Koevoets
20 hours ago
Tried uninstalling and reinstalling, to no avail.

Valentin (NOT vchuravy)
🧽 20 hours ago
hmm works for me with official binaries

Valentin (NOT vchuravy)
🧽 20 hours ago
as in, downloaded from the website

Christophe Grandsire-Koevoets
20 hours ago
asdf gets the binary from https://julialang-s3.julialang.org/bin/linux/x64/1.8/julia-1.8.0-linux-x86_64.tar.gz, which as far as I can tell is the same link as on the website.

Valentin (NOT vchuravy)
🧽 20 hours ago
indeed it is

Christophe Grandsire-Koevoets
20 hours ago
To test that asdf doesn't break anything, I tried downloading the tar.gz from the official website, untar it in a folder and running Julia from there, but I get the same problem. So it really comes from the official binary.

Christophe Grandsire-Koevoets
20 hours ago
Once again, no other Julia version I have installed shows this problem...
Can it have to do with Ubuntu 18.04? It's an old version but still supported, so I wouldn't expect a problem with that (been meaning to upgrade my VM but never got any time to do so).

Valentin (NOT vchuravy)
🧽 20 hours ago
your first error ("ccall requires the compiler") suggests to me that maybe some linking step is going wrong on your machine?

Valentin (NOT vchuravy)
🧽 20 hours ago
though I think 1.7 was the first to seperate codegen from the runtime, so that shouldn't happen with 1.8..

Christophe Grandsire-Koevoets
20 hours ago
But why would that hit 1.8.0 but not 1.7.3?

Valentin (NOT vchuravy)
🧽 20 hours ago
no idea 🤷

Harmen Stoppels
20 hours ago
if you run it with env -i /path/to/julia-1.8.0/bin/julia, does that work?

Christophe Grandsire-Koevoets
20 hours ago

$ env -i ~/.asdf/installs/julia/1.8.0/bin/julia 
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org/
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.0 (2022-08-17)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

┌ Warning: Terminal not fully functional
└ @ Base client.jl:410
julia> 

So it kinda works.

Harmen Stoppels
20 hours ago
it's probably LD_LIBRARY_PATH, does it work with LD_LIBRARY_PATH= ./julia (edited)

Christophe Grandsire-Koevoets
20 hours ago
Yes!
🎉
1

Harmen Stoppels
19 hours ago
pointing to another julia lib dir? 😜

Christophe Grandsire-Koevoets
19 hours ago
Unfortunately, I need LD_LIBRARY_PATH set up for other programs I'm running. And no, it doesn't seem to point to anything Julia-like:

$ echo $LD_LIBRARY_PATH 
/usr/lib/x86_64-linux-gnu:/home/christophe/.oracle/instantclient_19_8::/usr/local/MATLAB/MATLAB_Runtime/v96/runtime/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v96/bin/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v96/sys/os/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v96/extern/bin/glnxa64

Christophe Grandsire-Koevoets
19 hours ago
Let me try something...

Christophe Grandsire-Koevoets
19 hours ago
OK, I cannot start Julia unless my LD_LIBRARY_PATH is empty. Which is just not feasible for me. Why does Julia suddenly mess up with this? Until 1.7.3 I had no problem...

Valentin (NOT vchuravy)
🧽 19 hours ago
not sure if that's related, but there's a double : in your LD_LIBRARY_PATH

Valentin (NOT vchuravy)
🧽 19 hours ago
could it be that there's some other library in one of those paths that is overriding one of the libraries julia ships?

Christophe Grandsire-Koevoets
19 hours ago
I removed the double ::, but it's indeed unrelated. And yes, that's probably the issue. Unfortunately, I have no choice but to have LD_LIBRARY_PATH set up that way, or other programs I need won't work. And once again, never had an issue until 1.8.0, so I don't get why it's suddenly a problem.

Harmen Stoppels
19 hours ago
still thinking it's a libstdc++.so shadowing the julia or system version (edited)
☝️
1

Valentin (NOT vchuravy)
🧽 19 hours ago
is there anything else printed after the SYSTEM: caught exception of type?

Christophe Grandsire-Koevoets
19 hours ago
Agreed. But once again, no choice here. I have to have that environment variable set up the way it is, and Julia has had no problem with it until now.

Christophe Grandsire-Koevoets
19 hours ago
@Sukera
No, Julia hangs at that point

Valentin (NOT vchuravy)
🧽 19 hours ago
do you have the TERM environment variable set?

Valentin (NOT vchuravy)
🧽 19 hours ago
what other julia environment variables have you set?

Christophe Grandsire-Koevoets
19 hours ago

$ echo $TERM
xterm-256color

Harmen Stoppels
19 hours ago
i think i see the issue

Valentin (NOT vchuravy)
🧽 19 hours ago
that doesn't make sense

Valentin (NOT vchuravy)
🧽 19 hours ago
you only get the "terminal not fully functional" error if TERM isn't set and you're not on windows

Valentin (NOT vchuravy)
🧽 19 hours ago

            term_env = get(ENV, "TERM", @static Sys.iswindows() ? "" : "dumb")
            term = REPL.Terminals.TTYTerminal(term_env, stdin, stdout, stderr)
            banner && Base.banner(term)
            if term.term_type == "dumb"
                repl = REPL.BasicREPL(term)
                quiet || @warn "Terminal not fully functional"

Harmen Stoppels
19 hours ago
libjulia-codegen.so.1.8 sets an rpath, others are using runpath, that's usually a recipe for disaster

Harmen Stoppels
19 hours ago
what does ldd libjulia-codegen.so.1.8 give you? (with your dirty environment) (edited)

Christophe Grandsire-Koevoets
19 hours ago

$ ldd libjulia-codegen.so.1.8
ldd: ./libjulia-codegen.so.1.8: No such file or directory

Valentin (NOT vchuravy)
🧽 19 hours ago
at the place of the library, so in usr/lib of your julia install

Christophe Grandsire-Koevoets
19 hours ago
Oops, sorry. It's kinda above my level already. Gimme a second
😆
1

Christophe Grandsire-Koevoets
19 hours ago

$ ldd libjulia-codegen.so.1.8
	linux-vdso.so.1 (0x00007ffe653b2000)
	libunwind.so.8 => /home/christophe/.asdf/installs/julia/1.8.0/lib/julia/./libunwind.so.8 (0x00007ffb61825000)
	libLLVM-13jl.so => /home/christophe/.asdf/installs/julia/1.8.0/lib/julia/./libLLVM-13jl.so (0x00007ffb5d16d000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ffb5cf69000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007ffb5cd61000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ffb5cb42000)
	libatomic.so.1 => /home/christophe/.asdf/installs/julia/1.8.0/lib/julia/./libatomic.so.1 (0x00007ffb61aee000)
	libjulia.so.1 => not found
	libjulia-internal.so.1 => /home/christophe/.asdf/installs/julia/1.8.0/lib/julia/./libjulia-internal.so.1 (0x00007ffb5c581000)
	libstdc++.so.6 => /home/christophe/.asdf/installs/julia/1.8.0/lib/julia/./libstdc++.so.6 (0x00007ffb5c36d000)
	libgcc_s.so.1 => /home/christophe/.asdf/installs/julia/1.8.0/lib/julia/./libgcc_s.so.1 (0x00007ffb61ad3000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ffb5bf7c000)
	/lib64/ld-linux-x86-64.so.2 (0x00007ffb61a58000)
	libz.so.1 => /home/christophe/.asdf/installs/julia/1.8.0/lib/julia/./libz.so.1 (0x00007ffb5bd61000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ffb5b9c3000)
	libjulia.so.1 => /home/christophe/.asdf/installs/julia/1.8.0/lib/julia/./../libjulia.so.1 (0x00007ffb61aaf000)

Harmen Stoppels
19 hours ago
yeah, libjulia.so.1 => not found worries me a little bit, it's thanks to rpath+runpath mixing.

Valentin (NOT vchuravy)
🧽 19 hours ago
it's found though?

Valentin (NOT vchuravy)
🧽 19 hours ago
just in the next line

Valentin (NOT vchuravy)
🧽 19 hours ago
wait

Valentin (NOT vchuravy)
🧽 19 hours ago
what

Valentin (NOT vchuravy)
🧽 19 hours ago
the entry is doubled

Valentin (NOT vchuravy)
🧽 19 hours ago
check the last line

Harmen Stoppels
19 hours ago
ah, it's a direct dep and a transitive dep, as a direct dep it's not located cause the rpath is wrong (doesn't include $ORIGIN/..), but it's found down the line (edited)

Harmen Stoppels
19 hours ago
does this change anything for you?
patchelf --set-rpath '$ORIGIN/:$ORIGIN/../' libjulia-codegen.so.1.8

Christophe Grandsire-Koevoets
19 hours ago
Give me a moment, I have a call right now.

Christophe Grandsire-Koevoets
18 hours ago
OK, did what you asked
@harmen_stoppels
, and now I still can't run Julia without changing the LD_LIBRARY_PATH, but I also get this:

christophe@Ubuntu-VirtualBox: ~/.asdf/installs/julia/1.8.0/lib/julia ((v0.10.2))
$ patchelf --set-rpath '$ORIGIN/:$ORIGIN/../' libjulia-codegen.so.1.8
christophe@Ubuntu-VirtualBox: ~/.asdf/installs/julia/1.8.0/lib/julia ((v0.10.2))
$ ldd libjulia-codegen.so.1.8
./libjulia-codegen.so.1.8: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by ./libjulia-codegen.so.1.8)
	linux-vdso.so.1 (0x00007ffd9df77000)
	libunwind.so.8 => /usr/lib/x86_64-linux-gnu/libunwind.so.8 (0x00007f5af8f1c000)
	libLLVM-13jl.so => /home/christophe/.asdf/installs/julia/1.8.0/lib/julia/./libLLVM-13jl.so (0x00007f5af4864000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f5af4660000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f5af4458000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f5af4239000)
	libatomic.so.1 => /usr/lib/x86_64-linux-gnu/libatomic.so.1 (0x00007f5af4031000)
	libjulia.so.1 => /home/christophe/.asdf/installs/julia/1.8.0/lib/julia/./../libjulia.so.1 (0x00007f5af91a4000)
	libjulia-internal.so.1 => /home/christophe/.asdf/installs/julia/1.8.0/lib/julia/./libjulia-internal.so.1 (0x00007f5af3a70000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f5af36e7000)
	libgcc_s.so.1 => /usr/local/MATLAB/MATLAB_Runtime/v96/sys/os/glnxa64/libgcc_s.so.1 (0x00007f5af34d0000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f5af30df000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f5af9137000)
	liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f5af2eb9000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f5af2c9c000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f5af28fe000)

Christophe Grandsire-Koevoets
18 hours ago
LD_LIBRARY_PATH seems to mess everything up indeed.

Harmen Stoppels
18 hours ago
at least my suspicion about libstdc++ was right 😜 yeah, so matlab ships another set of compiler suport libraries...
you could try to replace the runpath elsewhere with rpath too so that it does not listen to LD_LIBRARY_PATH

Harmen Stoppels
18 hours ago
can you run libtree libjulia-codegen.so.1.8 it may tell you where LD_LIBRARY_PATH is actually overriding things.

Harmen Stoppels
18 hours ago
https://github.com/haampie/libtree

Christophe Grandsire-Koevoets
18 hours ago

$ ~/Downloads/libtree libjulia-codegen.so.1.8 
libjulia-codegen.so.1 
├── libunwind.so.8 [LD_LIBRARY_PATH]
│   └── liblzma.so.5 [ld.so.conf]
│       └── libpthread.so.0 [ld.so.conf]
├── libatomic.so.1 [LD_LIBRARY_PATH]
│   └── libpthread.so.0 [ld.so.conf]
├── libLLVM-13jl.so [runpath]
│   ├── librt.so.1 [ld.so.conf]
│   │   └── libpthread.so.0 [ld.so.conf]
│   ├── libpthread.so.0 [ld.so.conf]
│   └── libz.so.1 [ld.so.conf]
├── libjulia-internal.so.1 [runpath]
│   ├── libunwind.so.8 [LD_LIBRARY_PATH]
│   ├── libatomic.so.1 [LD_LIBRARY_PATH]
│   ├── libz.so.1 [runpath]
│   ├── libjulia.so.1 [runpath]
│   │   └── libpthread.so.0 [ld.so.conf]
│   ├── librt.so.1 [ld.so.conf]
│   └── libpthread.so.0 [ld.so.conf]
├── libjulia.so.1 [runpath]
├── librt.so.1 [ld.so.conf]
└── libpthread.so.0 [ld.so.conf]

Harmen Stoppels
18 hours ago
if you do: patchelf --force-rpath libjulia-codegen.so.1 and patchelf --force-rpath libjulia-internal.so.1 does that fix it

Christophe Grandsire-Koevoets
18 hours ago
Nope 😞

Harmen Stoppels
18 hours ago
okay lol, patchelf is a broken piece of software. you have to use it like patchelf --force-rpath --set-rpath '$ORIGIN/:$ORIGIN/../' [lib] (edited)

Christophe Grandsire-Koevoets
18 hours ago
Success!

Christophe Grandsire-Koevoets
18 hours ago

$ ~/Downloads/libtree libjulia-codegen.so.1.8 
libjulia-codegen.so.1 
├── libunwind.so.8 [rpath]
│   └── libz.so.1 [runpath]
├── libLLVM-13jl.so [rpath]
│   ├── libz.so.1 [rpath of 1]
│   ├── librt.so.1 [ld.so.conf]
│   │   └── libpthread.so.0 [ld.so.conf]
│   └── libpthread.so.0 [ld.so.conf]
├── libatomic.so.1 [rpath]
│   └── libpthread.so.0 [ld.so.conf]
├── libjulia-internal.so.1 [rpath]
│   ├── libunwind.so.8 [rpath]
│   ├── libz.so.1 [rpath]
│   ├── libatomic.so.1 [rpath]
│   ├── libjulia.so.1 [rpath]
│   │   └── libpthread.so.0 [ld.so.conf]
│   ├── librt.so.1 [ld.so.conf]
│   └── libpthread.so.0 [ld.so.conf]
├── libjulia.so.1 [rpath]
├── librt.so.1 [ld.so.conf]
└── libpthread.so.0 [ld.so.conf]

Christophe Grandsire-Koevoets
18 hours ago
And julia finally runs without workarounds

Christophe Grandsire-Koevoets
18 hours ago
But what a mess...

Harmen Stoppels
18 hours ago
lol, it's still rather questionable... in particular that libz is detected through the rpath of libjulia-codegen... 😬

Harmen Stoppels
18 hours ago
to be fair: LD_LIBRARY_PATH containing a non-system version of gcc runtime libs is asking for broken executables...
☝️
1

Harmen Stoppels
18 hours ago
however, julia's libs are not in the best state either.

Christophe Grandsire-Koevoets
18 hours ago
Not looking forward to do that at each Julia update...

Valentin (NOT vchuravy)
🧽 18 hours ago
opening an issue for that sounds reasonable

Valentin (NOT vchuravy)
🧽 18 hours ago
no matter that MATLAB seemingly intentionally breaks things here, julia should continue to work

Harmen Stoppels
18 hours ago
just open an issue, maybe julia should use rpaths, but for sure it should fix the incomplete r(un)path of libjulia-codegen (which depends on libjulia but is not in its search path) (edited)
💯
1

Christophe Grandsire-Koevoets
18 hours ago
Mmm... I need to take a shower, walk the dog and have dinner. I'll definitely open an issue, but that'll have to wait until tomorrow. I'll update this thread once that's done.
Thanks for the help!

As for the TL;DR, and once again, it's slightly above my geek level, so I may be summarising this wrong (which is why I added the full thread in here):

  1. I installed the newest Julia 1.8.0 on my Ubuntu 18.04 VM using asdf, which pulls the official Julia binaries;
  2. Trying to run the REPL failed, even with no base environment and no startup.jl file present. The REPL preamble would appear and then Julia would print SYSTEM: caught exception of type and then it would hang. Nothing short of killing the Julia process would allow me to access the terminal again;
  3. After a lot of help and back and forth, it was discovered that I had LD_LIBRARY_PATH set, so that I could run some other stuff on my computer (including the MATLAB Runtime). Unfortunately, libjulia-codegen.so.1.8 seemed to be set to listening to LD_LIBRARY_PATH when looking for its dependencies, and those found the wrong libraries, including trying to use libstdc++.so from the MATLAB Runtime, which being the wrong version caused everything to fail;
  4. By running patchelf --force-rpath on both libjulia-codegen.so.1.8 and libjulia-internal.so.1.8, I was able to get the right dependencies to be found, and was able to run Julia 1.8.0 without a problem from then on.

Now, from what I understand, what MATLAB did (having a set of non-standard gcc runtime libs in LD_LIBRARY_PATH) is a recipe for disasters, but even then Julia's libs are not in the best state either, what with libjulia-codegen.so.1.8 having an incomplete r(un)path, and it was worth opening an issue to solve this problem.

So I'm opening this issue. Once again, sorry if this issue is not of the best quality, but how Linux shared libraries work is not my specialty, and I know nothing of rpath or runpath. So no idea if this is easy to solve or not. I just know that I have no choice in the matter of having LD_LIBRARY_PATH set the way it is, and Julia shouldn't hang just because other programs make use of it, even if their use of it is questionable at best.

BTW, here's the output of versioninfo() now that I can run Julia 1.8.0 without a problem:

Julia Version 1.8.0
Commit 5544a0fab76 (2022-08-17 13:38 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
  Threads: 1 on 8 virtual cores
Environment:
  LD_LIBRARY_PATH = /usr/lib/x86_64-linux-gnu:/home/christophe/.oracle/instantclient_19_8:/usr/local/MATLAB/MATLAB_Runtime/v96/runtime/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v96/bin/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v96/sys/os/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v96/extern/bin/glnxa64

Thanks for your understanding.

@Moelf
Copy link
Contributor

Moelf commented Aug 19, 2022

this is blocking real-world usage for sure: https://gitlab.cern.ch/sft/lcgcmake/-/merge_requests/1406

@TS-CUBED
Copy link

TS-CUBED commented Aug 22, 2022

Can confirm this on our HPC cluster (CentOS 7.9), however the fix above (patchelf) does not work for me.

EDIT (Solved): the patchelf fix on slack does work and solves (works around) the issue:

patchelf --force-rpath --set-rpath '$ORIGIN/:$ORIGIN/../' ./julia-1.8.0/lib/julia/libjulia-codegen.so.1.8
patchelf --force-rpath --set-rpath '$ORIGIN/:$ORIGIN/../' ./julia-1.8.0/lib/julia/libjulia-internal.so.1.8

Original message:

If I load the module for the gcc toolchain, which sets LD_LIBRARY_PATH=/opt/ohpc/pub/mpi/openmpi-gnu/1.10.7/lib:/opt/ohpc/pub/compiler/gcc/5.4.0/lib64

Launching Julia 1.8 fails with:

$ julia-1.8.0/bin/julia 
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.0 (2022-08-17)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |


SYSTEM: caught exception of type

That's it, no exception type is reported. Julio does not react to this stage (Ctrl-C does not terminate the process), need to send it to backround and send signal 9 to kill the julia process.

If I unload the modules, then julia launches correctly.

Edit: Julia 1.7.3 works fine on the same system.

Edit2: obviously there will be quite a few changes to the libraries with different toolchains. I get the same with gnu7 and gnu8 toolchains. On another cluster (the big cluster, still on CentOS 7.3) the same version of Julia works fine with gnu7 toolchain.

@ararslan ararslan added bug Indicates an unexpected problem or unintended behavior building Build system, or building Julia or its dependencies regression Regression in behavior compared to a previous version labels Aug 22, 2022
@ararslan ararslan added this to the 1.8 milestone Aug 22, 2022
@jgreener64
Copy link
Contributor

I ran into this too and am using LD_LIBRARY_PATH= ./julia as a workaround.

staticfloat added a commit that referenced this issue Aug 23, 2022
When loading dependencies on Linux, we can either use `RPATH` or
`RUNPATH` as a list of relative paths to search for libraries.  The
difference, for our purposes, mainly lies within how this interacts with
`LD_LIBRARY_PATH`: `RPATH` is searched first, then `LD_LIBRARY_PATH`,
then `RUNPATH`.  So by using `RUNPATH` here, we are explicitly allowing
ourselves to be overridden by `LD_LIBRARY_PATH`.  This is fine, as long
as we are consistent across our entire library line, however in the
`v1.8.0` release, there was an inconsistency, reported in [0].

The inconsistency occured because of the following confluence of factors:

 - Ancient `ld` builds (such as the one used in our build environment)
   do not default to using `RUNPATH`, but instead use `RPATH`.
 - `patchelf`, when it rewrites the RPATH, will default to using
   `RUNPATH` instead.
 - We were only using `patchelf` on `libjulia-internal`, not on
   `libjulia-codegen`, which was newly added in `v1.8`.

These three factors together caused us to ship a binary with `RUNPATH`
in `libjulia-internal`, but `RPATH` in `libjulia-codegen`, which caused
loading to fail in [0] due to first `libjulia-internal` being loaded,
(which brought in the external `libstdc++`), then `libjulia-codegen`
failed to load (because it found an incompatible `libstdc++`), causing
the mysterious compiler error.

This PR fixes this twofold; first, when building the libraries in the
first place, we pass `--enable-new-dtags` to the linker to encourage it
to use `runpath` when possible.  This removes the possibility for a
missing `patchelf` invocation to break things in this way.  Second, we
apply `patchelf` properly to `libjulia-codegen` as well.

[0] #46409
staticfloat added a commit that referenced this issue Aug 23, 2022
When loading dependencies on Linux, we can either use `RPATH` or
`RUNPATH` as a list of relative paths to search for libraries.  The
difference, for our purposes, mainly lies within how this interacts with
`LD_LIBRARY_PATH`: `RPATH` is searched first, then `LD_LIBRARY_PATH`,
then `RUNPATH`.  So by using `RUNPATH` here, we are explicitly allowing
ourselves to be overridden by `LD_LIBRARY_PATH`.  This is fine, as long
as we are consistent across our entire library line, however in the
`v1.8.0` release, there was an inconsistency, reported in [0].

The inconsistency occured because of the following confluence of factors:

 - Ancient `ld` builds (such as the one used in our build environment)
   do not default to using `RUNPATH`, but instead use `RPATH`.
 - `patchelf`, when it rewrites the RPATH, will default to using
   `RUNPATH` instead.
 - We were only using `patchelf` on `libjulia-internal`, not on
   `libjulia-codegen`, which was newly added in `v1.8`.

These three factors together caused us to ship a binary with `RUNPATH`
in `libjulia-internal`, but `RPATH` in `libjulia-codegen`, which caused
loading to fail in [0] due to first `libjulia-internal` being loaded,
(which brought in the external `libstdc++`), then `libjulia-codegen`
failed to load (because it found an incompatible `libstdc++`), causing
the mysterious compiler error.

This PR fixes this twofold; first, when building the libraries in the
first place, we pass `--enable-new-dtags` to the linker to encourage it
to use `runpath` when possible.  This removes the possibility for a
missing `patchelf` invocation to break things in this way.  Second, we
apply `patchelf` properly to `libjulia-codegen` as well.

[0] #46409
@staticfloat
Copy link
Member

Should be fixed by #46464, with a backport to 1.8 via #46465

staticfloat added a commit that referenced this issue Aug 24, 2022
* Consistently use `RUNPATH` in our libraries

When loading dependencies on Linux, we can either use `RPATH` or
`RUNPATH` as a list of relative paths to search for libraries.  The
difference, for our purposes, mainly lies within how this interacts with
`LD_LIBRARY_PATH`: `RPATH` is searched first, then `LD_LIBRARY_PATH`,
then `RUNPATH`.  So by using `RUNPATH` here, we are explicitly allowing
ourselves to be overridden by `LD_LIBRARY_PATH`.  This is fine, as long
as we are consistent across our entire library line, however in the
`v1.8.0` release, there was an inconsistency, reported in [0].

The inconsistency occured because of the following confluence of factors:

 - Ancient `ld` builds (such as the one used in our build environment)
   do not default to using `RUNPATH`, but instead use `RPATH`.
 - `patchelf`, when it rewrites the RPATH, will default to using
   `RUNPATH` instead.
 - We were only using `patchelf` on `libjulia-internal`, not on
   `libjulia-codegen`, which was newly added in `v1.8`.

These three factors together caused us to ship a binary with `RUNPATH`
in `libjulia-internal`, but `RPATH` in `libjulia-codegen`, which caused
loading to fail in [0] due to first `libjulia-internal` being loaded,
(which brought in the external `libstdc++`), then `libjulia-codegen`
failed to load (because it found an incompatible `libstdc++`), causing
the mysterious compiler error.

This PR fixes this twofold; first, when building the libraries in the
first place, we pass `--enable-new-dtags` to the linker to encourage it
to use `runpath` when possible.  This removes the possibility for a
missing `patchelf` invocation to break things in this way.  Second, we
apply `patchelf` properly to `libjulia-codegen` as well.

[0] #46409

* fix whitespace

Co-authored-by: Kristoffer Carlsson <kcarlsson89@gmail.com>
staticfloat added a commit that referenced this issue Aug 24, 2022
* Consistently use `RUNPATH` in our libraries

When loading dependencies on Linux, we can either use `RPATH` or
`RUNPATH` as a list of relative paths to search for libraries.  The
difference, for our purposes, mainly lies within how this interacts with
`LD_LIBRARY_PATH`: `RPATH` is searched first, then `LD_LIBRARY_PATH`,
then `RUNPATH`.  So by using `RUNPATH` here, we are explicitly allowing
ourselves to be overridden by `LD_LIBRARY_PATH`.  This is fine, as long
as we are consistent across our entire library line, however in the
`v1.8.0` release, there was an inconsistency, reported in [0].

The inconsistency occured because of the following confluence of factors:

 - Ancient `ld` builds (such as the one used in our build environment)
   do not default to using `RUNPATH`, but instead use `RPATH`.
 - `patchelf`, when it rewrites the RPATH, will default to using
   `RUNPATH` instead.
 - We were only using `patchelf` on `libjulia-internal`, not on
   `libjulia-codegen`, which was newly added in `v1.8`.

These three factors together caused us to ship a binary with `RUNPATH`
in `libjulia-internal`, but `RPATH` in `libjulia-codegen`, which caused
loading to fail in [0] due to first `libjulia-internal` being loaded,
(which brought in the external `libstdc++`), then `libjulia-codegen`
failed to load (because it found an incompatible `libstdc++`), causing
the mysterious compiler error.

This PR fixes this twofold; first, when building the libraries in the
first place, we pass `--enable-new-dtags` to the linker to encourage it
to use `runpath` when possible.  This removes the possibility for a
missing `patchelf` invocation to break things in this way.  Second, we
apply `patchelf` properly to `libjulia-codegen` as well.

[0] #46409

* Update Make.inc

Co-authored-by: Mosè Giordano <giordano@users.noreply.github.com>

Co-authored-by: Mosè Giordano <giordano@users.noreply.github.com>
@maleadt maleadt closed this as completed Aug 25, 2022
@jgreener64
Copy link
Contributor

I am still seeing a variant of this issue on Julia 1.8.1. I have LD_LIBRARY_PATH non-empty and Julia works fine. Then I run module load compilers/gcc/8.3.0 which adds /public/gcc/8_3_0/lib:/public/gcc/8_3_0/lib64 to LD_LIBRARY_PATH. Now Julia fails with SYSTEM: caught exception of type and hangs.

The module load is required for unrelated software so I can work around this, but it worked okay on Julia 1.7.

@Moelf
Copy link
Contributor

Moelf commented Sep 12, 2022

I think there's still some related issue: https://gitlab.cern.ch/sft/lcgcmake/-/merge_requests/1406#note_5990927

@tsela
Copy link
Author

tsela commented Sep 19, 2022

OK, strange. I updated Julia to 1.8.1 on my home computer last week and it worked fine. But today, when I tried doing the same on my work computer, I got the SYSTEM: caught exception of type error again. To be exact, here's the full error I got:

$ julia 
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.1 (2022-09-06)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

[ Info: Precompiling OhMyREPL [5fb14364-9ced-5910-84b2-373655c76a03]
ERROR: LoadError: `ccall` requires the compiler
Stacktrace:
  [1] unsafe_convert(#unused#::Type{Ptr{UInt8}}, a::Vector{UInt8})
    @ Base pointer.jl:65
  [2] unsafe_convert(#unused#::Type{Ptr{Nothing}}, a::Vector{UInt8})
    @ Base pointer.jl:66
  [3] fill!(a::Vector{UInt8}, x::UInt8)
    @ Base array.jl:429
  [4] zeros(#unused#::Type{UInt8}, dims::Tuple{Int64})
    @ Base array.jl:589
  [5] zeros(#unused#::Type{UInt8}, dims::Int64)
    @ Base array.jl:584
  [6] Dict{Char, Tokenize.Tokens.Kind}()
    @ Base dict.jl:90
  [7] Dict{Char, Tokenize.Tokens.Kind}(::Pair{Char, Tokenize.Tokens.Kind}, ::Vararg{Pair{Char, Tokenize.Tokens.Kind}})
    @ Base dict.jl:110
  [8] top-level scope
    @ ~/.julia/packages/Tokenize/bZ0tu/src/token_kinds.jl:814
  [9] include(x::String)
    @ Tokenize.Tokens ~/.julia/packages/Tokenize/bZ0tu/src/token.jl:1
 [10] top-level scope
    @ ~/.julia/packages/Tokenize/bZ0tu/src/token.jl:7
 [11] include(x::String)
    @ Tokenize ~/.julia/packages/Tokenize/bZ0tu/src/Tokenize.jl:1
 [12] top-level scope
    @ ~/.julia/packages/Tokenize/bZ0tu/src/Tokenize.jl:7
 [13] top-level scope
    @ stdin:1
in expression starting at /home/christophe/.julia/packages/Tokenize/bZ0tu/src/token_kinds.jl:814
in expression starting at /home/christophe/.julia/packages/Tokenize/bZ0tu/src/token.jl:1
in expression starting at /home/christophe/.julia/packages/Tokenize/bZ0tu/src/Tokenize.jl:1
in expression starting at stdin:1
ERROR: LoadError: Failed to precompile Tokenize [0796e94c-ce3b-5d07-9a54-7f471281c624] to /home/christophe/.julia/compiled/v1.8/Tokenize/jl_E1RkFj.
Stacktrace:
 [1] top-level scope
   @ stdin:1
in expression starting at /home/christophe/.julia/packages/OhMyREPL/oDZvT/src/OhMyREPL.jl:2
in expression starting at stdin:1
`ccall` requires the compiler

SYSTEM: caught exception of type

Once again, nothing short of killing the process gives me access to the terminal again.

Running libtree on libjulia-codegen.so.1.8 in the 1.8.1 lib/julia folder gives me:

$ ~/Downloads/libtree libjulia-codegen.so.1.8 
libjulia-codegen.so.1 
├── libunwind.so.8 [LD_LIBRARY_PATH]
│   └── liblzma.so.5 [ld.so.conf]
│       └── libpthread.so.0 [ld.so.conf]
├── libatomic.so.1 [LD_LIBRARY_PATH]
│   └── libpthread.so.0 [ld.so.conf]
├── libLLVM-13jl.so [runpath]
│   ├── librt.so.1 [ld.so.conf]
│   │   └── libpthread.so.0 [ld.so.conf]
│   ├── libpthread.so.0 [ld.so.conf]
│   └── libz.so.1 [ld.so.conf]
├── libjulia-internal.so.1 [runpath]
│   ├── libunwind.so.8 [LD_LIBRARY_PATH]
│   ├── libatomic.so.1 [LD_LIBRARY_PATH]
│   ├── libz.so.1 [runpath]
│   ├── libjulia.so.1 [runpath]
│   │   └── libpthread.so.0 [ld.so.conf]
│   ├── librt.so.1 [ld.so.conf]
│   └── libpthread.so.0 [ld.so.conf]
├── libjulia.so.1 [runpath]
├── librt.so.1 [ld.so.conf]
└── libpthread.so.0 [ld.so.conf]

Which seems to indicate that some libraries are still chosen via LD_LIBRARY_PATH. When using the Slack workaround of running:

patchelf --force-rpath --set-rpath '$ORIGIN/:$ORIGIN/../' ./julia-1.8.0/lib/julia/libjulia-codegen.so.1.8
patchelf --force-rpath --set-rpath '$ORIGIN/:$ORIGIN/../' ./julia-1.8.0/lib/julia/libjulia-internal.so.1.8

I can run Julia 1.8.1 again correctly, and running libtree now gives me:

$ ~/Downloads/libtree libjulia-codegen.so.1.8 
libjulia-codegen.so.1 
├── libunwind.so.8 [rpath]
│   └── libz.so.1 [runpath]
├── libLLVM-13jl.so [rpath]
│   ├── libz.so.1 [rpath of 1]
│   ├── librt.so.1 [ld.so.conf]
│   │   └── libpthread.so.0 [ld.so.conf]
│   └── libpthread.so.0 [ld.so.conf]
├── libatomic.so.1 [rpath]
│   └── libpthread.so.0 [ld.so.conf]
├── libjulia-internal.so.1 [rpath]
│   ├── libunwind.so.8 [rpath]
│   ├── libz.so.1 [rpath]
│   ├── libatomic.so.1 [rpath]
│   ├── libjulia.so.1 [rpath]
│   │   └── libpthread.so.0 [ld.so.conf]
│   ├── librt.so.1 [ld.so.conf]
│   └── libpthread.so.0 [ld.so.conf]
├── libjulia.so.1 [rpath]
├── librt.so.1 [ld.so.conf]
└── libpthread.so.0 [ld.so.conf]

So it seems to me that the fix that was applied wasn't good enough. Do we need to open a new issue or should we reopen this one?

@jgreener64
Copy link
Contributor

This still occurs for me on Julia 1.8.2. Probably worth re-opening the issue?

@KristofferC KristofferC reopened this Oct 1, 2022
@Moelf
Copy link
Contributor

Moelf commented Oct 1, 2022

It has been 3 releases 😭

(Sorry for the noise

@tsela
Copy link
Author

tsela commented Oct 3, 2022

Yep, can confirm that the problem is still not solved. Running libtree on libjulia-codegen.so.1.8 in /lib/julia of Julia 1.8.2 gives me:

$ ~/Downloads/libtree libjulia-codegen.so.1.8 
libjulia-codegen.so.1 
├── libunwind.so.8 [LD_LIBRARY_PATH]
│   └── liblzma.so.5 [ld.so.conf]
│       └── libpthread.so.0 [ld.so.conf]
├── libatomic.so.1 [LD_LIBRARY_PATH]
│   └── libpthread.so.0 [ld.so.conf]
├── libLLVM-13jl.so [runpath]
│   ├── librt.so.1 [ld.so.conf]
│   │   └── libpthread.so.0 [ld.so.conf]
│   ├── libpthread.so.0 [ld.so.conf]
│   └── libz.so.1 [ld.so.conf]
├── libjulia-internal.so.1 [runpath]
│   ├── libunwind.so.8 [LD_LIBRARY_PATH]
│   ├── libatomic.so.1 [LD_LIBRARY_PATH]
│   ├── libz.so.1 [runpath]
│   ├── libjulia.so.1 [runpath]
│   │   └── libpthread.so.0 [ld.so.conf]
│   ├── librt.so.1 [ld.so.conf]
│   └── libpthread.so.0 [ld.so.conf]
├── libjulia.so.1 [runpath]
├── librt.so.1 [ld.so.conf]
└── libpthread.so.0 [ld.so.conf]

In other words the exact same output as with 1.8.1. Running the patchelf workaround still works, but this is getting annoying...

@giordano
Copy link
Contributor

giordano commented Oct 3, 2022

I also ran into this the other day with v1.8.2.

@staticfloat
Copy link
Member

So actually, by consistently using RUNPATH and not RPATH, we are making ourselves more vulnerable to environment variables being set, not less. That is because RUNPATH is (intentionally) searched after LD_LIBRARY_PATH. This is generally considered the "correct" behavior for applications, as if LD_LIBRARY_PATH is set, you generally want your applications to listen to it, since it's kind of the tool of last resort in many cases.

That being said, we also understand that it's frustrating to have things be so broken when LD_LIBRARY_PATH is screwed up. We have a plan to specifically address libstdc++ issues where we load it in a special, more intelligent manner, and it's possible that this could help here, as it will force loading of Julia's internal libstdc++ rather than preferring the system libstdc++. Please note that this is a very special case for libstdc++ and that we won't really be able to do this for other libraries as well.

I know for users who have come from older versions of Julia where we used RPATH more than RUNPATH that it's frustrating that Julia is now more sensitive to environment variables, but please understand that we have pressure on both sides of the issue here; there are users that have system libraries that screw Julia up (such as in this thread) and there are users that want a good way to have Julia load their system libraries for deeper integration with their system (and therefore require the LD_LIBRARY_PATH behavior to work). Because RUNPATH is the "officially recommended" way of shipping well-behaved software, I don't think we're going to go back to the RPATH behavior, so for users that must have LD_LIBRARY_PATHs with libstdc++ versions on their path, using the patchelf workaround is likely the best workaround.

It is possible that in the future, we will be able to distance ourselves further from the dynamic linker's constraints and implement an even more flexible binary loading system that gives us the ability to selectively ignore things by editing a Preferences.toml file, but we're a long ways away from there. Until then, we're kind of stuck between a rock and a hard place here, and I'm afraid I will have to classify this issue as there's really nothing we can do.

@giordano
Copy link
Contributor

giordano commented Oct 3, 2022

For what is worth, in the clusters at my university julia is actually a wrapper script which internally sets LD_LIBRARY_PATH appropriately to avoid these issues (and without screwing up the rest of the system). I'm personally fine with this solution, but I was surprised the other day because I was trying Julia on a different system and the error I got was not about undefined GLIBCXX symbols, which would have made the issue clearer, and the hang with the

SYSTEM: caught exception of type

message was a bit confusing (and I forgot about this issue).

@staticfloat
Copy link
Member

staticfloat commented Oct 3, 2022

Yes, I do think we should attempt to provide a better error message here; I'm thinking it might be possible to detect that the file libjulia-codegen.so exists on disk, but it's unable to be dlopen()'ed, which should probably be a fatal error, rather than getting farther into the bringup process and then failing.

X-ref: #47027

@tsela
Copy link
Author

tsela commented Nov 18, 2022

What the...?! The patchelf workaround doesn't work for Julia 1.8.3! What's going on? I'm running the two commands as shown above (the "Slack workaround"), which so far worked to make Julia behave, but now they do absolutely nothing! I don't get it!

What's going on? Julia has become completely unusable on my Ubuntu machine!

@Moelf
Copy link
Contributor

Moelf commented Nov 18, 2022

Yeah also still want this to be fixed so our HPC can use it again

@giordano
Copy link
Contributor

I mean, the solution has always been to not set LD_LIBRARY_PATH, or at least to set it to a sensible value which allows the julia process to start, right? As I mentioned above in our clusters at UCL we've done that for quite some time now (way before 1.8)

@Moelf
Copy link
Contributor

Moelf commented Nov 18, 2022

I'm not the HPC admin and I don't think it's possible given how CERN's computation environment works, it is breaking unless this is considered a bug fix

@DilumAluthge
Copy link
Member

DilumAluthge commented Nov 18, 2022

If you're not an admin on your system, could you try the following out and see if it works? It's the same as @giordano's wrapper script solution, with just a small modification to detect the location of the .../lib/julia directory.

Save the following script in your home directory as ~/myjulia:

#!/bin/bash

JULIA_LIB=$(LD_LIBRARY_PATH="" julia -e 'println(joinpath(dirname(Sys.BINDIR), "lib", "julia"))')

export LD_LIBRARY_PATH="${JULIA_LIB:?}:${LD_LIBRARY_PATH}"

julia "${@}"

And do chmod +x ~/myjulia

And then you'd run Julia with e.g. ~/myjulia -t2 -e '1 + 1'.

@giordano
Copy link
Contributor

Maybe replace

JULIA_LIB=$(LD_LIBRARY_PATH="" julia -e 'println(joinpath(dirname(Sys.BINDIR), "lib", "julia"))')

with

JULIA_LIB=$(LD_LIBRARY_PATH="" julia --compile=min -O0 --startup-file=no -E 'joinpath(dirname(Sys.BINDIR), "lib", "julia")')

to reduce startup by more than an order of magnitude in case there is an init file

@Moelf
Copy link
Contributor

Moelf commented Nov 19, 2022

their tests won't pass so nothing is merged yet thus I can't test but I can see if I can jam that into the julia alias "script"

@vtjnash vtjnash closed this as completed Feb 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior building Build system, or building Julia or its dependencies regression Regression in behavior compared to a previous version
Projects
None yet
Development

No branches or pull requests