Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Libdl.dlopen doesn't find shared libraries anymore #26557

Closed
nsmith5 opened this issue Mar 21, 2018 · 36 comments · Fixed by #31748
Closed

Libdl.dlopen doesn't find shared libraries anymore #26557

nsmith5 opened this issue Mar 21, 2018 · 36 comments · Fixed by #31748
Assignees
Labels
docs This change adds or pertains to documentation needs news A NEWS entry is required for this change

Comments

@nsmith5
Copy link
Contributor

nsmith5 commented Mar 21, 2018

Libdl.dlopen doesn't seem to be able to find shared libraries the way it did on 0.6.x

On Julia 0.6.x

julia> versioninfo()
Julia Version 0.6.2
Commit d386e40c17 (2017-12-13 18:08 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Prescott)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)

julia> lib = Libdl.dlopen("libz")
Ptr{Void} @0x00007f47adcb91a0

On Julia 0.7.x-DEV

julia> versioninfo()
Julia Version 0.7.0-DEV.4631
Commit 9a55c8fbc* (2018-03-19 03:59 UTC)
Platform Info:
  OS: Linux (x86_64-redhat-linux)
  CPU: Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, skylake)
Environment:

julia> using Libdl

julia> lib = Libdl.dlopen("libz")
ERROR: could not load library "libz"
libz.so: cannot open shared object file: No such file or directory
Stacktrace:
 [1] dlopen(::String) at /builddir/build/BUILD/julia/build/usr/share/julia/site/v0.7/Libdl/src/Libdl.jl:99
 [2] top-level scope
 [3] macro expansion at /builddir/build/BUILD/julia/build/usr/share/julia/site/v0.7/REPL/src/REPL.jl:117 [inlined]
 [4] (::getfield(REPL, Symbol("##28#29")){REPL.REPLBackend})() at ./event.jl:92

julia> lib = Libdl.dlopen("libz.so")
ERROR: could not load library "libz.so"
libz.so: cannot open shared object file: No such file or directory
Stacktrace:
 [1] dlopen(::String) at /builddir/build/BUILD/julia/build/usr/share/julia/site/v0.7/Libdl/src/Libdl.jl:99
 [2] top-level scope
 [3] macro expansion at /builddir/build/BUILD/julia/build/usr/share/julia/site/v0.7/REPL/src/REPL.jl:117 [inlined]
 [4] (::getfield(REPL, Symbol("##28#29")){REPL.REPLBackend})() at ./event.jl:92

julia> lib = Libdl.dlopen("libz.so.1")
Ptr{Nothing} @0x00007f139153b520

As you can see, it works once I specify the exact name of the shared library, but not before then.

Other Details

  • Operating system: Linux (Fedora 27)
@vtjnash
Copy link
Member

vtjnash commented Mar 21, 2018

Yes, you should either (1) use BinDeps (2) specify the name of the correct library version (3) install the libz-dev package. This was changed intentionally to reduce the number of test configurations, since many systems (Windows, Nix, etc.) require you to specify the correct name in full.

@vtjnash vtjnash closed this as completed Mar 21, 2018
@StefanKarpinski
Copy link
Member

Seems likely to be due to #22828.

@StefanKarpinski
Copy link
Member

Reopening because I think this is the canary in the coal mine and we're going to get a lot of people hitting this problem with no clue what to do and there should be some better way of helping them.

@nsmith5
Copy link
Contributor Author

nsmith5 commented Mar 21, 2018

Yeah, I didn't spot anything in NEWS.md about the behaviour change and didn't find anything relevant in a quick search through the PR's and issues. I suspect a lot of package breakage will occur because of this (This example comes from CodecZlib in fact..) and documented alternatives will be very helpful.

@nsmith5
Copy link
Contributor Author

nsmith5 commented Mar 21, 2018

Out of curiousity, why does installing the header files (eg. libz-dev / libz-devel in this case) fix this issue?

@nsmith5
Copy link
Contributor Author

nsmith5 commented Mar 21, 2018

It seems like many of the examples of ccall from the documentation will probably need to be updated. This first example from Calling C and Fortran Code for instance:

julia> t = ccall((:clock, "libc"), Int32, ())
ERROR: error compiling top-level scope: could not load library "libc"
/usr/bin/../lib64/libc.so: invalid ELF header
Stacktrace:
 [1] macro expansion at /builddir/build/BUILD/julia/build/usr/share/julia/site/v0.7/REPL/src/REPL.jl:117 [inlined]
 [2] (::getfield(REPL, Symbol("##28#29")){REPL.REPLBackend})() at ./event.jl:92

julia> t = ccall((:clock, "/usr/lib64/libc.so.6"), Int32, ())
2667541

@StefanKarpinski
Copy link
Member

This is a breaking change and should have a deprecation to let people know what they should do.

@vtjnash
Copy link
Member

vtjnash commented Mar 21, 2018

I suspect a lot of package breakage will occur because of this

Not very many. CodecZlib is one of the only libraries that doesn't use BinDeps. And it'll force them to fix their package for use on less ubiquitous platforms like NixOS (instead of making the fix conditional on running on Windows). And force us to fix our documentation (while sometimes true that these functions are in a shlib named libc, that's not particularly universal).

@vtjnash vtjnash added docs This change adds or pertains to documentation needs news A NEWS entry is required for this change labels Mar 21, 2018
@StefanKarpinski
Copy link
Member

This not only needs docs and NEWS, I really think we should not be just breaking this but providing a proper deprecation. We'll see how it goes but I think this just the first of many complaints we'll see.

@staticfloat
Copy link
Member

Reading through the linked PR, I disagree that this should have been merged. Yes, there are reasons for wanting to encourage users to use versioned library names when opening them, but we don't force users to specify a major and minor version number of a Julia package when they load it, we just load the latest available and provide the option (through Pkg.pin() or whatever) to specify if they have the need to restrict what they load. I think the same argument could be made regarding dynamic libraries.

@vtjnash
Copy link
Member

vtjnash commented Mar 22, 2018

we just load the latest available and provide the option

That's not what the option did on the majority of platforms / configurations. Instead, we had to explain how it worked on some distributions in ideal circumstances (aka, pretty much just zlib), then explain that if that didn't work for whatever reasons, you should have actually just specified the actual version of the library you wanted in the first place, which would have simply worked on all platforms in the first place.

That's also not how package resolution works. We don't just randomly upgrade all of the packages, we explicit list which ones are compatible, and then ensure that the currently visible and enumerable environment is consistent and restricted to the versions that are specified in the local manifest. That's also how our build works now that the linked PR is merged (it creates folders mapping out the currently visible environment and searches that for the requested name).

@StefanKarpinski
Copy link
Member

We don't just randomly upgrade all of the packages

That's exactly what we do in Pkg2 😂

@nsmith5
Copy link
Contributor Author

nsmith5 commented Mar 22, 2018

@vtjnash could you live with the old behaviour being back in the code base if it was well documented that its a bad approach for maintaining binary dependencies? It seems like it was removed in hopes of encouraging better behaviour from package developers, but maybe some documentation could achieve the same result.

@vtjnash
Copy link
Member

vtjnash commented Mar 22, 2018

I don't think that writing documentation about how not to do something is particularly advisable. I prefer we don't do it and instead explain up front how to write examples and code that'll work with all platforms.

@nsmith5
Copy link
Contributor Author

nsmith5 commented Mar 22, 2018

For package development, I completely agree. Is it unreasonable to expect that people that aren't package developers are going to be using the C FFI though? Its seems strange that the barrier to entry to the C FFI should be higher than compiling a similar C program. Compare the following for instance:

Compiling a C program with libm

$ gcc prog.c -o main -lm

Calling libm from Julia 0.7.x

out = ccall((:sin, "libm.so.6"), Float64, (Float64,), 1.0)

Shouldn't Julia have the same basic level of convenience as GCC?

@vtjnash
Copy link
Member

vtjnash commented Mar 23, 2018

If we're going to talk about how you would write this in C, why not write it like we're using C and drop the library name. This works on all versions of Julia:

ccall(:sin, Float64, (Float64,), 1.0)

We can add additional entries to the list of libraries that we want to be visible to packages by default (

$(eval $(call symlink_system_library,libpcre2-8,PCRE))
). I initially figured that it's not really necessary to list libraries there that we statically linked against, since it's not necessary (or particularly advisable) to give the actual library name (too much variety across platform). But we can use this build function as a means of normalizing the names across platforms if we decide we want to (something again that the old mapping did incorrectly).

@staticfloat
Copy link
Member

We don't just randomly upgrade all of the packages, we explicit list which ones are compatible, and then ensure that the currently visible and enumerable environment is consistent and restricted to the versions that are specified in the local manifest.

In package loading, I just say using Foo; I don't say using Foo:0.2.3. I argue the using construct is the more direct analogue to dlopen() than anything having to do with Pkg. Pkg is more similar to dpkg dependency lists and whatnot, for which I argue yes absolutely we should enforce as much versioning strictness as possible without making it onerous. But for actual code loading, it doesn't make sense to me to enforce this this strongly. We should provide the option, of course, as it can only help, but making it the default seems extreme to me.

Regarding @nsmith5's point above, there's actually an important difference as GCC is providing compile-time guarantees here; it compiles against version x.y.z of libm, and then encodes which version it was built against into the compiled binary. That dodges a lot of problems that we are trying to solve here in Julia, so I think that's an imperfect argument.

why not write it like we're using C and drop the library name.

Because that doesn't work in the general case? We're not going to symlink a random libfoo onto our library search path to rebuild functionality that we had in Julia. If you didn't like the functionality because it was badly tested, we should have written tests for it.

@vtjnash
Copy link
Member

vtjnash commented Mar 23, 2018

because it was badly tested

We had a test for it. Not a very good one, but it failed CI pretty frequently anyways, so it was setup internally to return success whenever it failed (specifically, this test https://github.com/JuliaLang/julia/pull/26581/files#diff-bf20429d6316882a26470433941b41c5R204)

@vtjnash
Copy link
Member

vtjnash commented Mar 23, 2018

We're not going to symlink a random libfoo onto our library search path

Er, are you not the same staticfloat that's building a package for linking a random libfoo into our library search path in preparation for handling this better in conjunction with Pkg3 :).

Reference: the script for testing the installation of an actual libfoo into the library search path: https://github.com/JuliaPackaging/BinaryProvider.jl/tree/e9dd1a8f39ba6ede973165512788cfa374ad7bf6/test/LibFoo.jl/deps

@nsmith5
Copy link
Contributor Author

nsmith5 commented Mar 23, 2018

That dodges a lot of problems that we are trying to solve here in Julia, so I think that's an imperfect argument.

Is this an argument against the feature or the implementation? I mention the C compiler because the feature is evident. I can understand if our implementation is currently wanting, but is there some technical reason we won't ever be able to implement it properly?

@vtjnash
Copy link
Member

vtjnash commented Mar 23, 2018

In this context, isn't "the C compiler + linker + autoconf scripts" == "the BinDeps.jl compiler"? Like the existing meta-build systems for C, I think we've found it's more reliable to run these as a part of the build process, where it is able to run arbitrary user code, cause side-effects, and provide useful debugging information.

@nsmith5
Copy link
Contributor Author

nsmith5 commented Mar 23, 2018

No, in this context the feature set would be just "linker + loader". No autoconf. Its not about being cross platform or reliable. Its a brittle approach to using a C library, but it is very simple.

The point I'm making with the C example is that the combined behaviour of the linker, loader and environment (LD_LIBRARY_PATH) hide the details of the library version you're using and where it is on your system. I think that we need the C FFI to have the same feature set. It is the entry point to C programming and it should be the entry point to using the C FFI in Julia.

@vtjnash
Copy link
Member

vtjnash commented Mar 23, 2018

I think that we need the C FFI to have the same feature set

Why? Does anyone else do this? What "feature set" are we talking about here? If you just want to make this a feature of the REPL, that's an entirely different question.

@nsmith5
Copy link
Contributor Author

nsmith5 commented Mar 23, 2018

Hmm, sorry about the lack of clarity. Lets establish some more clear language: Lets say that a feature is some bit of code in the Julia landscape that provides some functionality. In your example the BinDep.jl compiler feature provides the functionality of "C Compiler + Linker + Autoconf Scripts".

I think there is a demand for the functionality of "Linker + Loader" in the C FFI. To be clear about what that functionality is, you provide,

  1. The name of a function
  2. The name of a library

and the "Linker + Loader" find a valid version of that library and makes calls to the member function you specified.

I think the Julia C FFI is the feature that should be providing this functionality. Specifically, ccall and dlopen should provide this functionality. Moreover, ccall and dlopen should behave no better and no worse than the linker and loader when you compile equivalent C code.

Here is a more detailed example of simple C program and equivalent C FFI call in Julia. I've pointed out the functionality I'm talking about in each case.

1.) C Example

foo.h

int foo(int, int);

main.c

#include <foo.h>
void main() {
    int a, b, c;
    b = 1;
    c = 2;
    a = foo(b, c);   // Just specify a function name
    return
}

compile and run

$ gcc main.c -lfoo # <-- Just specify a library name
$ ./a.out  # <-- works because the linker and loader deal with the details

2.) Julia Example

compile and run

julia> c = ccall((:foo, "libfoo"), Int, (Int, Int), 1, 2) # <-- This should work as well. I've specified a function name and library name and I want ccall and dlopen deal with the details.

Does that help clarify?

@vtjnash
Copy link
Member

vtjnash commented Mar 23, 2018

No, it doesn't. Why does the C example require 2 steps (the header file is extraneous), but the Julia example requires doing it in 1 step to achieve "equivalent functionality".

@StefanKarpinski
Copy link
Member

Julia is at least twice as convenient as C.

@vtjnash
Copy link
Member

vtjnash commented Mar 24, 2018

While that's a nice thought, it's worth pointing out that the distribution you are choosing to use is going out of its way to make sure that dlopen won't work unless you pass it a fully qualified name. So rather than ganging up against me for not wanting to implement work-arounds for your distribution, maybe you should open an issue with your libc maintainer and package manager and ask them why they have policies against this. By contrast, Apple and Microsoft generally do not try to prohibit this. Although on Windows, this feature was affectionally known as "dll hell", so that may provide a slight hint.

@nalimilan
Copy link
Member

nalimilan commented Mar 24, 2018

FWIW gcc main.c -lfoo will only work if you have libfoo.so somewhere, in which case ccall((:foo, "libfoo"), ...) will also work in Julia. Anyway I'm not sure the comparison with C is very interesting here since it's notoriously not the most convenient language around, and more importantly it's a static language. This means that the library is resolved at compile time and at runtime the version used for compilation will be loaded. In Julia these two steps happen at the same time, as if you called gcc every time you start the program.

@ihnorton
Copy link
Member

Anyway I'm not sure the comparison with C is very interesting

Ok. So let's compare to Python then. Outside of performance, our competition is not (primarily) C.

https://docs.python.org/3/library/ctypes.html#finding-shared-libraries

On Linux, find_library() tries to run external programs (/sbin/ldconfig, gcc, objdump and ld) to find the library file. It returns the filename of the library file.

@nsmith5
Copy link
Contributor Author

nsmith5 commented Mar 24, 2018

@vtjnash I am very sorry. I wouldn't want you to feel like you're being ganged up on. It looks like your opinion in the minority on this topic, but its far from unwelcome in my perspective. In fact, I'd like to know at lot more, because it seems like you have a heap of expertise in this area.

Speaking of which, whats the distribution issue you mentioned? Is this problem only faced by Redhat based distros? I'm happy to try to lobby them for a change if it makes our lives easier.

@KristofferC
Copy link
Member

Having a find_library function that tries its best to find the library path is different from having ccall itself do that though, right? It seems Jameson is against ccall itself trying to do a bunch of magic, and want to offload that to some stdlib / package. This seems similar to how python has done it?

@StefanKarpinski
Copy link
Member

I could definitely get on board with having a find_library function, perhaps in Libdl? Would that satisfy your desire to remove this functionality from ccall, @vtjnash? It seems like it would also provide a fairly straightforward deprecation too – i.e. change ccall with a library name to calling find_library explicitly.

@nsmith5
Copy link
Contributor Author

nsmith5 commented Mar 25, 2018

Also on board with that. In fact Libdl already has a find_library function, it just needs some improvement to provide the functionality we've been talking about.

@StefanKarpinski
Copy link
Member

It could also have a strict::Bool=false keyword argument that controls whether to do the numbered .so name search or not.

@staticfloat
Copy link
Member

I would be happy with a find_library() function. (Ironic that I'm working on the counterpart, Sys.which() right now. :P )

@mgkuhn
Copy link
Contributor

mgkuhn commented Nov 29, 2018

The documentation problem that the very first ccall example in the manual on https://docs.julialang.org/en/v1/manual/calling-c-and-fortran-code/ fails on Ubuntu 18.04 is still acute for Julia 1.0.2:

julia> t = ccall((:clock, "libc"), Int32, ())
ERROR: error compiling top-level scope: could not load library "libc"
/usr/lib/x86_64-linux-gnu/libc.so: invalid ELF header

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs This change adds or pertains to documentation needs news A NEWS entry is required for this change
Projects
None yet
8 participants