Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cctools ld segfaults when linking haskellPackages.Agda (2.6.2 only?) #149692

Closed
sternenseemann opened this issue Dec 8, 2021 · 68 comments
Closed
Labels
0.kind: bug Something is broken 6.topic: agda "A dependently typed programming language / interactive theorem prover" 6.topic: darwin Running or building packages on Darwin 6.topic: haskell

Comments

@sternenseemann
Copy link
Member

Example Log:

Building executable 'agda' for Agda-2.6.2..
[1 of 1] Compiling Main             ( src/main/Main.hs, dist/build/agda/agda-tmp/Main.o )
Linking dist/build/agda/agda ...
/nix/store/bp55vzlmcyqcx9n08pkzslvks10zybwc-clang-wrapper-11.1.0/bin/ld: line 256: 79105 Segmentation fault: 11  /nix/store/3fl5z9yfnz08kjb4cnhw2w8a2zsm5qim-cctools-binutils-darwin-949.0.1/bin/ld ${extraBefore+"${extraBefore[@]}"} ${params+"${params[@]}"} ${extraAfter+"${extraAfter[@]}"}
clang-11: error: linker command failed with exit code 139 (use -v to see invocation)
`cc' failed in phase `Linker'. (Exit code: 139)
builder for '/nix/store/fyzdyk7a79jqfh4r2gvjc65d9f7w22wd-Agda-2.6.2.drv' failed with exit code 1

Steps To Reproduce

nix-build -A haskellPackages.Agda on aarch64-darwin

cc @NixOS/darwin-maintainers

@sternenseemann sternenseemann added 0.kind: bug Something is broken 6.topic: darwin Running or building packages on Darwin labels Dec 8, 2021
@prusnak
Copy link
Member

prusnak commented Dec 8, 2021

Does adding autoSignDarwinBinariesHook to nativeBuildInputs help?

@prusnak
Copy link
Member

prusnak commented Dec 8, 2021

Does adding autoSignDarwinBinariesHook to nativeBuildInputs help?

Ah, sorry. This is about fixing the "Killed: 9" error, not Segmentation fault: 11 error.

@sternenseemann
Copy link
Member Author

Yeah, Agda is a normal haskell package which work for the most part.

@veprbl
Copy link
Member

veprbl commented Dec 8, 2021

Would help to post a crash dump from Other -> Console -> User Reports

@sternenseemann
Copy link
Member Author

Indeed, would be nice if someone could have a look at this locally, I can only check on Hydra logs, really.

@olebedev
Copy link
Member

olebedev commented Dec 8, 2021

Getting:

Linking dist/build/agda/agda ...
/nix/store/bp55vzlmcyqcx9n08pkzslvks10zybwc-clang-wrapper-11.1.0/bin/ld: line 256: 36953 Segmentation fault: 11  /nix/store/3fl5z9yfnz08kjb4cnhw2w8a2zsm5qim-cctools-binutils-darwin-949.0.1/bin/ld ${extraBefore+"${extraBefore[@]}"} ${params+"${params[@]}"} ${extraAfter+"${extraAfter[@]}"}
clang-11: error: linker command failed with exit code 139 (use -v to see invocation)
`cc' failed in phase `Linker'. (Exit code: 139)
builder for '/nix/store/fyzdyk7a79jqfh4r2gvjc65d9f7w22wd-Agda-2.6.2.drv' failed with exit code 1
error: build of '/nix/store/fyzdyk7a79jqfh4r2gvjc65d9f7w22wd-Agda-2.6.2.drv' failed

on 7370b26

@turion
Copy link
Contributor

turion commented Dec 13, 2021

Is this maybe fixed since 2.6.2.1 (https://hydra.nixos.org/build/161184182)?

@sternenseemann
Copy link
Member Author

Interesting, let's keep this open in case it happens in another package

@sternenseemann sternenseemann changed the title cctools ld segfaults when linking haskellPackages.Agda cctools ld segfaults when linking haskellPackages.Agda (2.6.2 only?) Dec 13, 2021
@frontsideair
Copy link
Contributor

I have the exact same error with haskellPackages.lattices, as you can see from Hydra job.

@sternenseemann
Copy link
Member Author

Agda 2.6.2.1 started failing again as well iirc. I suspect this could possibly be OOM or a segfault, but impossible to test from my position.

@turion
Copy link
Contributor

turion commented Dec 30, 2021

Can someone on darwin reproduce and bisect?

@veprbl veprbl added the 6.topic: agda "A dependently typed programming language / interactive theorem prover" label Dec 30, 2021
@ZachFontenot
Copy link

I'm hitting this locally on aarch64-darwin not sure how I can help out though?

@siraben
Copy link
Member

siraben commented Mar 24, 2022

I can't find the commit where this succeeds on aarch64-darwin, has anyone found it?

@sternenseemann
Copy link
Member Author

2c85c77 according to hydra

@ZachFontenot
Copy link

ZachFontenot commented Mar 25, 2022

bff49dab05df8c825b268ca863140e6a119626b1 is the first bad commit
commit bff49dab05df8c825b268ca863140e6a119626b1
Merge: 40d43349e66 a7dda03a2e0
Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Date:   Tue Nov 23 06:01:42 2021 +0000

Result of Bisect, which sadly, that's a huge commit

@siraben
Copy link
Member

siraben commented Mar 26, 2022

My bisection agrees:

bff49dab05df8c825b268ca863140e6a119626b1 is the first bad commit                                                                                                                                                      
commit bff49dab05df8c825b268ca863140e6a119626b1                                                                                                                                                                       
Merge: 40d43349e66 a7dda03a2e0                                                                                                                                                                                        
Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>                                                                                                                                   
Date:   Tue Nov 23 06:01:42 2021 +0000                                                                                                                                                                                
                                                                                                                                                                                                                      
    Merge staging-next into staging                                                                                                                                                                                   

@unbel13ver
Copy link
Contributor

Hello. I faced the same issue while building coreutils package for aarch64-linux (natively). At configure step I see in config.log:

configure:5968: checking whether the C compiler works
configure:5990: gcc    conftest.c  >&5
/nix/store/h190xsp8qqcv4gqig698nvvpllkiabhi-bootstrap-stage4-gcc-wrapper-9.3.0/bin/ld: line 256:  1438 Segmentation fault      /nix/store/5p5gi835bb9p1fiw8dxbfs7dzkk3jd5m-binutils-2.38/bin/ld ${extraBefore+"${extraBefore[@]}"} ${params+"${params[@]}"} ${extraAfter+"${extraAfter[@]}"}
collect2: error: ld returned 139 exit status

My nixpkgs are

commit 1919e181fb00d7620b586206efe938bd3fd1e67c (HEAD -> master, origin/master, origin/HEAD)
Merge: cbebdfc3da9 17be6f75ce6
Author: Thiago Kenji Okada <thiagokokada@gmail.com>
Date:   Thu May 12 10:01:16 2022 +0100

@veprbl
Copy link
Member

veprbl commented May 16, 2022

@unbel13ver That can't be related. Please file a separate issue.

@soulomoon
Copy link

soulomoon commented Jun 24, 2022

I don't know if it is the same issue

    "nixpkgs": {
      "locked": {
        "lastModified": 1645433236,
        "narHash": "sha256-4va4MvJ076XyPp5h8sm5eMQvCrJ6yZAbBmyw95dGyw4=",
        "owner": "nixos",
        "repo": "nixpkgs",
        "rev": "7f9b6e2babf232412682c09e57ed666d8f84ac2d",
        "type": "github"
      },
building '/nix/store/8zr9cl44sdnr6mg70g3s3xkvl4dkdi0c-Agda-2.6.2.1.drv'...
error: builder for '/nix/store/8zr9cl44sdnr6mg70g3s3xkvl4dkdi0c-Agda-2.6.2.1.drv' failed with exit code 1;
       last 10 log lines:
       > [398 of 400] Compiling Agda.Compiler.JS.Compiler ( src/full/Agda/Compiler/JS/Compiler.hs, dist/build/Agda/Compiler/JS/Compiler.o, dist/build/Agda/Compiler/JS/Compiler.dyn_o )
       > [399 of 400] Compiling Agda.Compiler.Builtin ( src/full/Agda/Compiler/Builtin.hs, dist/build/Agda/Compiler/Builtin.o, dist/build/Agda/Compiler/Builtin.dyn_o )
       > [400 of 400] Compiling Agda.Main        ( src/full/Agda/Main.hs, dist/build/Agda/Main.o, dist/build/Agda/Main.dyn_o )
       > Preprocessing executable 'agda' for Agda-2.6.2.1..
       > Building executable 'agda' for Agda-2.6.2.1..
       > [1 of 1] Compiling Main             ( src/main/Main.hs, dist/build/agda/agda-tmp/Main.o )
       > Linking dist/build/agda/agda ...
       > /nix/store/km02igh4pshp20d0wn89rf5jjfxcm8v5-clang-wrapper-11.1.0/bin/ld: line 256: 44923 Segmentation fault: 11  /nix/store/diz9chv9r9m3pv9rzi7g3p2iq8vgsmr3-cctools-binutils-darwin-949.0.1/bin/ld ${extraBefore+"${extraBefore[@]}"} ${params+"${params[@]}"} ${extraAfter+"${extraAfter[@]}"}
       > clang-11: error: linker command failed with exit code 139 (use -v to see invocation)
       > `cc' failed in phase `Linker'. (Exit code: 139)
       For full logs, run 'nix log /nix/store/8zr9cl44sdnr6mg70g3s3xkvl4dkdi0c-Agda-2.6.2.1.drv'.

@Jake-Gillberg
Copy link
Contributor

Jake-Gillberg commented Jul 7, 2022

For anyone on M1 experiencing this issue, overriding with the x86_64-darwin package works with Rosetta.

@CrepeGoat
Copy link
Contributor

For anyone on M1 experiencing this issue, overriding with the x86_64-darwin package works with Rosetta.

A+ suggestion! In case others (like me 😅) want to try this but aren't familiar with how, here's a blog that talks about configuring nix to build x86_64 programs: https://evanrelf.com/building-x86-64-packages-with-nix-on-apple-silicon

@denizdogan
Copy link

For what it's worth, this is happening to me too, with Ormolu:

[11 of 11] Compiling Main             ( tests/Spec.hs, dist/build/tests/tests-tmp/Main.o, dist/build/tests/tests-tmp/Main.dyn_o )
Linking dist/build/tests/tests ...
/nix/store/48py6zrawzim9ghrnkqwm36jl4j1l23x-clang-wrapper-11.1.0/bin/ld: line 256: 70672 Segmentation fault: 11  /nix/store/5wvlj00dr22ivh210b18ccv1i60h6c1q-cctools-binutils-darwin-949.0.1/bin/ld ${extraBefore+"${extraBefore[@]}"} ${params+"${params[@]}"} ${extraAfter+"${extraAfter[@]}"}
clang-11: error: linker command failed with exit code 139 (use -v to see invocation)
`cc' failed in phase `Linker'. (Exit code: 139)

It's nice that there's the workaround of compiling for x86_64, but is there really no other way?

@veprbl
Copy link
Member

veprbl commented Sep 26, 2022

@denizdogan Please post a stacktrace. You can find it using Other -> Console -> Crash Reports.

@denizdogan
Copy link

@arjunkathuria
Copy link

Hi @cidkidnix , thanks for taking the time to reply to this. I'll try that and see if that fixes things for me.

@john-rodewald
Copy link
Member

-fwhole-archive-libs

I ran into this segfault trying to compile Hasura on aarch64 and this flag was the final missing puzzle piece. Thank you very much!

@sternenseemann
Copy link
Member Author

Interesting, we could consider enabling this by default via haskellPackages.mkDerivation on aarch64-darwin.

Can you compare output size of built libraries with and without this flag? Stripping efficiency would be interesting to know for making this decision.

@cidkidnix
Copy link
Contributor

cidkidnix commented Jun 12, 2023

Interesting, we could consider enabling this by default via haskellPackages.mkDerivation on aarch64-darwin.

I'm interested too, Been meaning to submit a patch upstream to GHC but I haven't found the time

the relevant fix would be something akin to (assuming the output size isn't massively different)

osSubsectionsViaSymbols :: Platform -> Bool
osSubsectionsViaSymbols platform = case (platformArch platform, platformOS platform) of
         (ArchAArch64, OSDarwin) -> False
         (_, OSDarwin) -> True
         (_, _) -> False

in https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/Platform.hs#L230

@cidkidnix
Copy link
Contributor

Oh yeah, also @john-rodewald the flag should actually be -fwhole-archive-hs-libs, If you could test with that flag specifically I would appreciate it.
Must of mistyped it a few days ago, oops.

@cidkidnix
Copy link
Contributor

diff --git a/compiler/GHC/Linker/Static.hs b/compiler/GHC/Linker/Static.hs
index aa51a2b7d6..eac801617b 100644
--- a/compiler/GHC/Linker/Static.hs
+++ b/compiler/GHC/Linker/Static.hs
@@ -135,7 +135,7 @@ linkBinary' staticLink logger tmpfs dflags unit_env o_files dep_units = do
     let
       dead_strip
         | gopt Opt_WholeArchiveHsLibs dflags = []
-        | otherwise = if osSubsectionsViaSymbols (platformOS platform)
+        | otherwise = if osSubsectionsViaSymbols platform
                         then ["-Wl,-dead_strip"]
                         else []
     let lib_paths = libraryPaths dflags
diff --git a/compiler/GHC/Platform.hs b/compiler/GHC/Platform.hs
index a5a609d252..e1fcbb1aa1 100644
--- a/compiler/GHC/Platform.hs
+++ b/compiler/GHC/Platform.hs
@@ -227,9 +227,11 @@ osUsesFrameworks _        = False
 platformUsesFrameworks :: Platform -> Bool
 platformUsesFrameworks = osUsesFrameworks . platformOS

-osSubsectionsViaSymbols :: OS -> Bool
-osSubsectionsViaSymbols OSDarwin = True
-osSubsectionsViaSymbols _        = False
+osSubsectionsViaSymbols :: Platform -> Bool
+osSubsectionsViaSymbols plat = case (platformArch plat, platformOS plat) of
+                                 (ArchAArch64, OSDarwin) -> False
+                                 (_, OSDarwin) -> True
+                                 (_, _) -> False

 -- | Minimum representable Int value for the given platform
 platformMinInt :: Platform -> Integer

Here's the patch based on GHC master that should fix the linking issues on aarch64-darwin. I think it should work for any GHC newer than ~3years

@yan-sh
Copy link

yan-sh commented Aug 24, 2023

This problem is also actual to me .-. After some updates my project has stopped builing with an error:

/nix/store/7v4rbxd8i0hsk2hgy8jnd4qn9vk89a86-clang-wrapper-11.1.0/bin/ld: line 269: 30615 Segmentation fault: 11 /nix/store/hwkh5cw8lfm0yhg7amvl4ffcjsvz07l6-cctools-binutils-darwin-973.0.1/bin/ld @<(printf "%q\n" ${extraBefore+"${extraBefore[@]}"} ${params+"${params[@]}"} ${extraAfter+"${extraAfter[@]}"}) clang-11: error: linker command failed with exit code 139 (use -v to see invocation) ghc: cc' failed in phase Linker'. (Exit code: 139)

I've tried building with 9.4.5 and 9.4.6 version and got same result. But yesterday my project was building that is weird. I've tried rollback my nix-channel but it had no effect.

My setup: mac m2, ghc 9.4.5/9.4.6

@reckenrode
Copy link
Contributor

reckenrode commented Aug 24, 2023

What happens if you run the linker manually with no input? If it still crashes, something may have happened to the codesigning.

@yan-sh
Copy link

yan-sh commented Aug 24, 2023

What happens if you run the linker manually with no input? If it still crashes, something may have happened to the codesigning.

Do you mean just try to run ld? If yes then:

Undefined symbols for architecture arm64:
  "_main", referenced from:
     implicit entry/start for main executable
ld: symbol(s) not found for architecture arm64```

@reckenrode
Copy link
Contributor

Do you mean just try to run ld?

Yep, thanks. Looks like it works, so it’s not a codesigning issue.

@yan-sh
Copy link

yan-sh commented Aug 24, 2023

I've read the thread and try pass the option '-fwhole-archive-hs-libs' to GHC and it kind a works :) Thanks all for help!

srid added a commit to srid/double-conversion-ex that referenced this issue Mar 8, 2024
@reckenrode
Copy link
Contributor

I’m working on updating cctools and ld64 to the latest versions. These versions are using Apple’s upstream sources and match the versions of those tools included with Xcode 15.3. If those don’t work, then I don’t know.

More details on Discourse: https://discourse.nixos.org/t/darwin-updates-news/42249/10

@alexfmpe
Copy link
Member

Had this same error message on an executable on a M1 with ghc 9.4.8 and package set from recent nixpkgs.

Both -fwhole-archive-hs-libs and -dynamic make nix-shell --run "cabal run" work, but if -dynamic is present then running the executable coming out of the nix-build fails with an rpath error when searching for the package's library

dyld[23803]: Library not loaded: @rpath/libHS<package>-0.1-<fingerprint-thing>-ghc9.4.8.dylib
  Referenced from: <some-uuid-thing> /nix/store/<hash>-<package>-0.1/bin/<executable>
  Reason: tried: <lotsa places including mostly variations of /nix/store/very-big-hash-some-dependency>

Poking around with otool -h I see the executable coming directly from cabal run has a

Load command 24
          cmd LC_RPATH
      cmdsize 128
         path <project-path>/dist-newstyle/<blabla>/build

and indeed that build directory contain a libHS<package>-0.1-inplace-ghc9.4.8.dylib (no fingerprint) which matches what I see in otool -L.

As for the nix-build executable, otool -l shows a @rpath/ entry where the .dylib name contains a fingerprint but not only am I short one cmd LC_RPATH for an external dependency (which I just noticed is not actually being used), but the one for the library is pointing to path /private/tmp/nix-build-<package>-0.1.drv-0/<executable>dist/build.

It seems like there's two bugs here, where both flags workaround the first bug that @cidkidnix has a patch for above, but avoiding with dynamic triggers a separate bug where the rpath is pointing at an ephemeral build directory?

@reckenrode
Copy link
Contributor

Had this same error message on an executable on a M1 with ghc 9.4.8 and package set from recent nixpkgs.

Is this reproducible, and is there a public repo? I’d like to see if it crashes on my ld64 update branch.

@alexfmpe
Copy link
Member

alexfmpe commented Apr 24, 2024

@reckenrode thanks for looking into this - here's a small repro using ghc 9.6 and the haskell-updates branch - everything should be cached
https://github.com/alexfmpe/reproducible-segfault-rpath

Doing nix-buildhits the seg fault, uncommenting -dynamic triggers the rpath error when running with $(nix-build)/bin/a but if you instead uncomment -fwhole-archive-hs-libs it should work.

https://github.com/alexfmpe/reproducible-segfault-rpath/blob/0f75a05fff5300c9156c84f3ea6dd1869005b154/a/a.cabal#L21-L25

@reckenrode
Copy link
Contributor

@alexfmpe Thanks for the repo. I cloned it and confirmed the crash with the haskell-updates branch, then I pointed it at my local ld64 update branch.

$ cabal build --project-file cabal.project all
Warning: The package list for 'hackage.haskell.org' does not exist. Run 'cabal
update' to download it.
Resolving dependencies...
Build profile: -w ghc-9.6.4 -O1
In order, the following will be built (use -v for more details):
 - a-0.1 (lib) (first run)
 - a-0.1 (exe:a) (first run)
Configuring library for a-0.1..
Preprocessing library for a-0.1..
Building library for a-0.1..
[1 of 1] Compiling A                ( src/A.hs, /Users/reckenrode/Developer/tmp/reproducible-segfault-rpath/dist-newstyle/build/aarch64-osx/ghc-9.6.4/a-0.1/build/A.o, /Users/reckenrode/Developer/tmp/reproducible-segfault-rpath/dist-newstyle/build/aarch64-osx/ghc-9.6.4/a-0.1/build/A.dyn_o )
Configuring executable 'a' for a-0.1..
Preprocessing executable 'a' for a-0.1..
Building executable 'a' for a-0.1..
[1 of 1] Compiling Main             ( src-bin/main.hs, /Users/reckenrode/Developer/tmp/reproducible-segfault-rpath/dist-newstyle/build/aarch64-osx/ghc-9.6.4/a-0.1/x/a/build/a/a-tmp/Main.o )
[2 of 2] Linking /Users/reckenrode/Developer/tmp/reproducible-segfault-rpath/dist-newstyle/build/aarch64-osx/ghc-9.6.4/a-0.1/x/a/build/a/a
$ cabal run --project-file cabal.project all
it worked

It could be the version jump (going from 609 to 951.9) or the source (switching to Apple’s OSS release from cctools-port), but I suspect the issue is cctools-port does not appear to build ld64 with -Wl,-stack_size,0x10000000. Both the Meson build system I use in my update and Apple’s Xcode project use that option to expand the stack from 8 MiB to 256 MiB. Having a significantly larger stack would presumably avoid the stack smashing that was observed in #149692 (comment).

@alexfmpe
Copy link
Member

suspect the issue is cctools-port does not appear to build ld64 with -Wl,-stack_size,0x10000000

Plausible, since my repro seems to vanish if I don't import from meaty dependencies.
Does the rpath issue persist in your branch?

@alexfmpe
Copy link
Member

alexfmpe commented Apr 26, 2024

confirmed the crash with the haskell-updates branch, then I pointed it at my local ld64 update branch.

Indeed, I got the same results.

Does the rpath issue persist in your branch?

It does, and it seems for dynamic linking I should instead be using
https://cabal.readthedocs.io/en/3.4/cabal-project.html?highlight=enabl#cfg-flag---enable-executable-dynamic

@reckenrode
Copy link
Contributor

FWIW, if you’re using the ld64 branch in my nixpkgs repo, it’s a bit old. ld64 should be fine, but it’s missing some additional compatibility work I’ve done. Once I’ve built the Darwin channel blockers successfully, I plan to open a draft PR for feedback.

@bacchanalia
Copy link

A note to anyone thinking of using -fwhole-archive-hs-libs as a workaround: the debug symbols in the executable will cause the build dependencies to also be added as runtime dependencies by nix.

@reckenrode
Copy link
Contributor

reckenrode commented Jul 28, 2024

Unfortunately, the test case @alexfmpe provided still crashes on staging-next (with the ld64 upgrade) with the workaround disabled. The stack trace is similar to the one posted in #149692 (comment), which points to the following code as the likely culprit.

https://github.com/apple-oss-distributions/ld64/blob/47f477cb721755419018f7530038b272e9d0cdea/src/ld/Resolver.cpp#L1180-L1198

Next I want to try to get it into a debugger, so I see whether it’s infinitely recursing or the same atom, or if this is a pathological case in ld64 that GHC is triggering.

Update: It must be late. I was testing against master. I changed it to download the tarball for staging-next, and the test case passes with the workarounds disabled. It does experience the rpath issue. There probably should be one for the test case’s lib directory in addition to GHC’s. The lack of that doesn’t seem like an ld64 issue.

@reckenrode
Copy link
Contributor

If someone can find a reproducible test case after the ld64 upgrade, please reopen. If I have to do it, I’ll rewrite the recursion to remove it, but the larger default stack size should make this a non-issue now.

Closed via #307880.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken 6.topic: agda "A dependently typed programming language / interactive theorem prover" 6.topic: darwin Running or building packages on Darwin 6.topic: haskell
Development

No branches or pull requests