Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot run tests with bindist using shell.nix missing lib gmp #704

Closed
nmattia opened this issue Feb 13, 2019 · 19 comments
Closed

Cannot run tests with bindist using shell.nix missing lib gmp #704

nmattia opened this issue Feb 13, 2019 · 19 comments

Comments

@nmattia
Copy link
Contributor

nmattia commented Feb 13, 2019

Build fails when using the ghc bindist if libgmp is not installed on the system. Even adding it to the nix-shell does not help:

[nix-shell:~/projects/tweag/rules_haskell]$ echo $LD_LIBRARY_PATH 
/nix/store/rd7n0v6mymvyqbw3d307caqc23c8kc71-gmp-6.1.2/lib

The link cannot find libgmp during build:

[nix-shell:~/projects/tweag/rules_haskell]$ bazel test //tests/binary-with-lib-dynamic/...
...
ERROR: /home/nicolas/projects/tweag/rules_haskell/tests/binary-with-lib-dynamic/BUILD:9:1: error executing shell command: '/nix/store/pawkx6zfnj59zvpa0hkrrqin9f9iwika-bash/bin/bash -c
        export PATH=${PATH:-} # otherwise GCC fails on Windows

        # this is equivalent to 'readarray'. We do not use 'readarray' ...' failed (Exit 1) bash failed: error executing command /nix/store/pawkx6zfnj59zvpa0hkrrqin9f9iwika-bash/bin/bash -c ... (remaining 1 argument(s) skipped)

/nix/store/3xwc1ip20b0p68sxqbjjll0va4pv5hbv-binutils-2.30/bin/ld.gold: error: cannot find -lgmp
collect2: error: ld returned 1 exit status
`gcc' failed in phase `Linker'. (Exit code: 1)
FAILED: Build did NOT complete successfully

Passing action_env and test_env does not help:

[nix-shell:~/projects/tweag/rules_haskell]$ bazel test //tests/binary-with-lib-dynamic/... --action_env=LD_LIBRARY_PATH= --test_env=LD_LIBRARY_PATH
<same error>
FAILED: Build did NOT complete successfully
[nix-shell:~/projects/tweag/rules_haskell]$ bazel test //tests/binary-with-lib-dynamic/... --action_env=LD_LIBRARY_PATH=$LD_LIBRARY_PATH --test_env=LD_LIBRARY_PATH=$LD_LIBRARY_PATH
<same error>
FAILED: Build did NOT complete successfully

Specifying --linkopt=-L/path/to/lib fixes some builds:

[nix-shell:~/projects/tweag/rules_haskell]$ bazel test //tests/binary-with-lib-dynamic/... --linkopt=-L$LD_LIBRARY_PATH
...
//tests/binary-with-lib-dynamic:binary-with-lib-dynamic         (cached) PASSED in 0.2s

Executed 0 out of 1 test: 1 test passes.
INFO: Build completed successfully, 9 total actions

But not all, the java_classpath now fails with a different error:

[nix-shell:~/projects/tweag/rules_haskell]$ bazel test //tests/java_classpath/... --linkopt=-L$LD_LIBRARY_PATH
/tweag/rules_haskell/tests/java_classpath/BUILD:8:1: error executing shell command: '/nix/store/pawkx6zfnj59zvpa0hkrrqin9f9iwika-bash/bin/bash -c
        export PATH=${PATH:-} # otherwise GCC fails on Windows

        # this is equivalent to 'readarray'. We do use 'readarray' in o...' failed (Exit 1) bash failed: error executing command /nix/store/pawkx6zfnj59zvpa0hkrrqin9f9iwika-bash/bin/bash -c ... (remaining 1 argument(s) skipped)

Use --sandbox_debug to see verbose messages from the sandbox
<command line>: can't load .so/.DLL for: libgmp.so (libgmp.so: cannot open shared object file: No such file or directory)
Target //tests/java_classpath:java_classpath failed to build
FAILED: Build did NOT complete successfully
@Profpatsch
Copy link
Contributor

We don’t officially support bindist on NixOS, do you expect it to work inside of a nix-shell?

@mboes
Copy link
Member

mboes commented Feb 18, 2019

I don't know what we could do here. If you're using Nix, don't use bindists and vice versa, and the error message as-is clearly states what is missing. Tempted to close this as wontfix.

@Profpatsch
Copy link
Contributor

@nmattia Is this related to #618?

@nmattia
Copy link
Contributor Author

nmattia commented Feb 25, 2019

Clarification: I'm on Ubuntu:

~$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.2 LTS
Release:	18.04
Codename:	bionic

I have libgmp installed globally. I actually have two versions, installed with this command:

$  sudo apt install libgmp-dev libgmp3-dev

The bindist can find the libgmp SO without issues:

$ wget https://downloads.haskell.org/~ghc/8.6.2/ghc-8.6.2-x86_64-deb8-linux.tar.xz
$ tar -xf ghc-8.6.2-x86_64-deb8-linux.tar.xz 
$ ldd ghc-8.6.2/libraries/integer-gmp/dist-install/build/libHSinteger-gmp-1.0.2.0-ghc8.6.2.so 
	linux-vdso.so.1 (0x00007ffd2e2da000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f655da2b000)
	libgmp.so.10 => /usr/lib/x86_64-linux-gnu/libgmp.so.10 (0x00007f655d7aa000)
	libHSghc-prim-0.5.3-ghc8.6.2.so => not found
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f655d3b9000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f655ddc9000)

This suggests that bazel is sanitizing the environment or at least not passing /usr/lib/x86_64-linux-gnu/ as a library directory.

@nmattia
Copy link
Contributor Author

nmattia commented Feb 25, 2019

Adding this line to the toolchain builder fixes the issue:

        export LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/ # new line
        ...
        "${ghc_args[@]}" "${extra_args[@]}" "${param_file_args[@]}"

So the linker doesn't search (by default) the directory where libgmp is installed. Another way to fix the build is to create a symlink to the library in /usr/lib:

$ cd /usr/lib
/usr/lib$ sudo ln -s x86_64-linux-gnu/libgmp.so ./libgmp.so

Both solutions are really work arounds. Open questions:

  1. Why is ld not looking in /usr/lib/x86_64-linux-gnu? Is it the fault of bazel's linker options or Nix's ld configuration?
  2. Do we consider libgmp as a mandatory system dependency or should libgmp be provided through bazel?

@mboes
Copy link
Member

mboes commented Feb 25, 2019

We still don't have clear repro instructions in this ticket. Quoting myself above:

I don't know what we could do here. If you're using Nix, don't use bindists and vice versa.

Furthermore, bindists are working in CI. So how do I reproduce this? Is the nix-shell necessary here to get the error, or do I get it even without the nix-shell? I need a sequence of steps.

@nmattia
Copy link
Contributor Author

nmattia commented Feb 25, 2019

To answer the question above:

  1. Why is ld not looking in /usr/lib/x86_64-linux-gnu? Is it the fault of bazel's linker options or Nix's ld configuration?

Most likely Nix's ld configuration's fault, for the reasons below:

  • The issue doesn't occur on CI
  • I cannot reproduce it locally when using a non-Nix provided Bazel installation

@nmattia
Copy link
Contributor Author

nmattia commented Feb 25, 2019

Clarification: to reproduce the issue:

  1. Install libgmp in /usr/lib/x86_64-linux-gnu
  2. Run bazel build //tests/binary-simple/... where bazel is provided through a nix-shell

@mboes
Copy link
Member

mboes commented Feb 25, 2019

Can you confirm that the issue does not arise when not using a nix-shell?

@nmattia
Copy link
Contributor Author

nmattia commented Feb 25, 2019

Can you confirm that the issue does not arise when not using a nix-shell?

Yes I confirm. Turns out this was a misunderstanding on my side: I didn't understand that we didn't want to support the bindist when using the nix-shell. I'll close this and update the README.

@nmattia nmattia closed this as completed Feb 25, 2019
@mboes
Copy link
Member

mboes commented Feb 25, 2019

Ah - you already gave the answer to my question, a few minutes before I asked it. OK so the new description for the ticket could be this: "GHC bindists don't work inside a nix-shell". If so, then I move to close this issue, because this is an upstream problem, and because I would argue that you never need a bindist if you are inside a nix-shell. GHC bindists fail even without Bazel.

@Profpatsch
Copy link
Contributor

When running the tests on a fresh Ubuntu 16.04 LTS VM, I experience the exact same problem, when not inside a nix-shell.

Inside nix-shell --pure the build works just fine with bindists.

If I symlink the .so like nmattia did above, the build-time linking works, but it fails when it tries to run the binary:

bazel-out/host/bin/tests/binary-with-lib/binary-with-lib: error while loading shared libraries: libgmp.so.10: cannot open shared object file: No such file or directory

@flokli mentioned that on the BuildKite issue, so we can assume it’s generally broken on Ubuntu.

@Profpatsch Profpatsch reopened this Jul 26, 2019
@aherrmann
Copy link
Member

I tend to encounter this issue on opensuse when using the GHC bindist while nix-build is in PATH. A work-around is to remove nix-build from PATH, e.g. .envrc

export PATH=`echo ${PATH} | awk -v RS=: -v ORS=: '/.nix-profile/ {next} {print}'`

@mboes
Copy link
Member

mboes commented Jul 26, 2019

@Profpatsch if you're going to reopen an old issue, can we have a repro? Also, I've seen scant evidence so far that this issue is at all rules_haskell specific.

@mboes
Copy link
Member

mboes commented Jul 26, 2019

So the issue arises because if nix-build is available in the PATH, then nixpkgs_cc_configure() isn't a noop. When it's not a noop, it effectively tells GHC to uses Nixpkgs' GCC. That GCC doesn't look up system libraries in the same places, understandably, hence the link failure.

@Profpatsch
Copy link
Contributor

So the issue arises because if nix-build is available in the PATH, then nixpkgs_cc_configure() isn't a noop. When it's not a noop, it effectively tells GHC to uses Nixpkgs' GCC.

wtf wtf wtf wtf

That’s what toolchains should be for, isn’t it. We can’t just suddenly do different things if the user goes and installs a tool.


But anyway, it’s not (just) that abberation. When removing nixpkgs from the environment, I still get a different error:

/home/vagrant/.cache/bazel/_bazel_vagrant/c8951b2080a46b210df5af21134647f3/execroot/rules_haskell/external/rules_haskell_ghc_linux_amd64/bin/../lib/bin/ghc: error while loading shared libraries: libtinfo.so.5: cannot open shared object file: No such file or directory

@aherrmann
Copy link
Member

@Profpatsch The libtinfo.so.5 error might just be a system version issue. See 364ea31.

@mboes
Copy link
Member

mboes commented Jul 26, 2019

That’s what toolchains should be for, isn’t it.

It's not. CC rules don't use that (yet, or maybe never, I don't know). They have a single external repo that contains symlink to the one compiler it's supposed to use for building C/C++ code.

@mboes
Copy link
Member

mboes commented Aug 12, 2019

Filed an issue upstream: tweag/rules_nixpkgs#88. Closing this one, since there's nothing rules_haskell can do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants