-
-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bootstrap Reform and aarch64-linux's GCC Upgrade #208412
Comments
A bit of progress update: I added tiny amount of comments to describe I'm currently playing locally with double I will post at least non-controversial proper PRs once succeed getting a working The gist of various bugs I encounter so far is the incorrect inclusion order of library paths between --- a/pkgs/stdenv/linux/bootstrap-tools/scripts/unpack-bootstrap-tools.sh
+++ b/pkgs/stdenv/linux/bootstrap-tools/scripts/unpack-bootstrap-tools.sh
@@ -17,6 +17,15 @@ else
LD_BINARY=$out/lib/ld-*so.?
fi
+# path to version-specific libraries, like libstdc++.so
+LIBSTDCXX_SO_DIR=$(echo $out/lib/gcc/*/*)
+
+# Move version-specific libraries out to avoid library mix when we
+# upgrade gcc.
+# TODO(trofi): update bootstrap tarball script and tarballs to put them
+# into expected location directly.
+LD_LIBRARY_PATH=$out/lib $LD_BINARY $out/bin/mv $out/lib/libstdc++.* $LIBSTDCXX_SO_DIR/
+
# On x86_64, ld-linux-x86-64.so.2 barfs on patchelf'ed programs. So
# use a copy of patchelf.
LD_LIBRARY_PATH=$out/lib $LD_BINARY $out/bin/cp $out/bin/patchelf .
@@ -25,8 +34,8 @@ for i in $out/bin/* $out/libexec/gcc/*/*/*; do
if [ -L "$i" ]; then continue; fi
if [ -z "${i##*/liblto*}" ]; then continue; fi
echo patching "$i"
- LD_LIBRARY_PATH=$out/lib $LD_BINARY \
- ./patchelf --set-interpreter $LD_BINARY --set-rpath $out/lib --force-rpath "$i"
+ LD_LIBRARY_PATH=$out/lib:$LIBSTDCXX_SO_DIR $LD_BINARY \
+ ./patchelf --set-interpreter $LD_BINARY --set-rpath $out/lib:$LIBSTDCXX_SO_DIR --force-rpath "$i"
done
for i in $out/lib/librt-*.so $out/lib/libpcre*; do |
I think I got a PoC:
The branch contains these non-controversial changes we could merge either as part of PoC or separately:
Please give it a go. |
This change switches to using GCC 11 by default on aarch64-linux, as well as passing `-lgcc` to the linker, per NixOS#201485. See NixOS#201254 and NixOS#208412 for wider context on the issue.
IMHO 6 is the long-term solution. |
6 definitely seems like the optimal solution, but it's going to need someone with enough knowledge of GCC's internals and enough free time, and I'm not sure that person exists right now. That said, I kind of like option 5 - it's a clever hack, which is a downside IMO, but it allows us to win some time in a less invasive way and keeps the hackery contained in the bootstrap. |
I should also add that I ran into this problem six months ago here, and seriously considered taking on # 6. But I was kind of demotivated by the general indifference to the problems caused by frankenstein-compilation. And a few of my recent major-project PRs have languished for 6+ months, requiring constant rebasing, which is further-demotivating. If there is now a general appreciation of why this is a problem, and people willing to allocate time to reviewing the resulting PR, I could take this up again. Would likely be aiming for right after 23.05, in the brief "it's (almost) okay to break stuff" window after the release. |
I am absolutely willing to help with this, but my knowledge of GCC bootstrapping is stuck in the late 00s, so I'll have to catch up a lot to understand the specifics. It's also probably worth mentioning that we now have way more resources available to Hydra, so "just build everything with it" isn't just a viable way of testing changes such as this - it's something we can do in a few days, so it should be possible to land this at any time without incurring downstream breakage. |
I'm absolutely not a compiler guy, just been interested in pushing this along for the practical consequences. So the possibilities and fixes are not all obvious to me. Thank you @amjoseph-nixpkgs for your additional ideas and experiments. Option 6 sounds like option 4 but implementing "We might be able to build the first GCC just once and the second GCC fewer times to keep the total number of builds less than double." It seems like the difficulty of reverting @trofi's proposal for option 4 is overstated, and I don't see how merging that proposal for 4 now makes achieving 6 any harder in the future, i.e. how it incurs technical debt. I'm not at this moment a fan of 5. At least on my machine and with an earlier revision building the stdenv from scratch with that option is dramatically slower than it was before. It also does not get us any closer to our goals for other architectures. It might be helpful in updating the bootstrap files without trusting a random contributor to build them on their own machine and without half-breaking aarch64-linux for a week to build new ones on Hydra. I'm re-rebuilding with the latest changes on that PR and will update when that completes. |
No, they are completely different.
That obsolete version should not be used for build-time measurements.
Because it will have to be reverted at that future time. It is far from being a one-line change (like #209462 is), so ability to revert cleanly depends on whether or not any other commit has touched the same lines, or lines near it. |
Yes, I get this. Maybe my language was unclear. Each realization of the derivation generated by the expression
I've completed measurement of the latest version and it does not substantially reduce the required build time (116 vs 103 minutes, compared to 40 before).
I don't understand why this is true. It is claimed that the first 3 of 4 commits in trofi's PR are general improvements, which are prerequisites for the modification of the bootstrap sequence, but would not have to be reverted. The last commit to modify the bootstrap sequence does make substantial modifications, but I don't see why we would have to trust git to be able to mechanically revert them. The new sequence would have to be crafted to accommodate the split GCC derivation and the appropriate documentation would have to be written too. I don't see how that would be any harder with trofi's revised sequence. trofi commented more on this here if you missed it |
This measurement was done against an obsolete commit; please be sure you built from 77c2173 -- it is does drastically less rebuilding on aarch64, at the expense of some complexity. |
Well, technically the third build is a test -- for comparison with stage2. The stage2 compiler is the finished product; the stage3 compiler is only built as a sanity check. It really ought to be part of the |
… from libc I would like to add an extra `gcc` build step during linux bootstrap (NixOS#208412). This makes it early bootstrap compiler linked and targeted against `bootstrapTools` `glibc` including it's headers. Without this change `gcc`'s spec files always prefer `bootstrapTools` `glibc` for header search path (passed in as --with-native-system-header-dir=). We'can't override it with: - `-I` option as it gets stacked before gcc-specific headers, we need to keep glibc headers after gcc as gcc cleans namespace up for C standard by using #include_next and by undefining system macros. - `-idirafter` option as it gets appended after existing `glibc`-includes This `--sysroot=/nix/store/does/not/exist` hack allows us to remove existing `glibc` headers and add new ones with `-idirafter`. We use `cc-cflags-before` instead of `libc-cflags` to allow user to define their own `--sysroot=` (like `firefox` does). To keep it working prerequisite cross-symlink in gcc.libs is required: NixOS#209153
I was able to come up with a proof of concept of option 2 here. Too janky and unknown to be a PR yet, but it does fix the two identified symptoms. Worth noting that guix does this same thing, but patches GCC instead of using a wrapper. |
Given I pointed you to a lot of this analysis, I would have appreciated a brief mention. |
I don't see a comment to this effect, but it looks like the herculean effort by @amjoseph-nixpkgs in #209870 will address this issue. |
Nixpkgs on aarch64-linux is currently stuck on GCC 9 (NixOS/nixpkgs#208412) and using gcc11Stdenv doesn't work either. So use c++2a instead of c++20 for now. Unfortunately this means we can't use some C++20 features for now (like std::span).
This change switches to using GCC 11 by default on aarch64-linux, as well as passing `-lgcc` to the linker, per NixOS#201485. See NixOS#201254 and NixOS#208412 for wider context on the issue. (cherry picked from commit 8442601)
#209870 got merged 5 hours ago, this should have been auto closed (but didn't?) |
Auto-closing normally happens when the thing reaches |
This is the research I have done trying to figure out how to best upgrade aarch64-linux from GCC 9. I've collected everything here to make the problems clear, provide context for those who can help, and help the community decide on the right path forward. Some sources are linked where appropriate, but I have lots more available on request.
The Problem
Upgrading aarch64-linux past GCC 9 breaks large numbers of packages. As a result, there is a specific clause in
all-packages.nix
which keeps aarch64-linux at GCC 9, while allowing every other platform to use 11 (and soon 12).GCC 9 is pretty old at this point and important packages, like KDE/Plasma, are demanding later versions to use modern C++ language features. We cannot practicably ship NixOS 23.05 with GCC 9. It must be upgraded before then, with sufficient time to test and fix up packages.
The two main breakages observed with more recent compilers are linker errors (e.g. with
pkgs.icu
) and random aborts (e.g.pkgs.expect
duringpkgs.dejagnu
's test phase)The Reason
The nixpkgs bootstrap sequence, which builds the latest GCC and stdenv using prebuilt seed binaries, is a bit sleazy. It compiles glibc with the old GCC, then builds the latest GCC and other utilities using that glibc. This results in the stdenv not being completely compiled by the new GCC.
In addition, and more importantly, GCC's low-level runtime library,
libgcc_s.so
, ends up simply copied from the GCC used in the bootstrap (currently 9 for aarch64-linux) to the glibc used in the stdenv (which would ordinarily be using GCC 11). This causes programs built with the later version of GCC to use the library of an earlier version, instead of the library expected by that GCC.The library is linked in automatically by GCC (and can be linked in manually using
-lgcc_s
) when it needs certain e.g. SIMD math routines or atomics. This going wrong (e.g. more recent GCCs having additional functions) results in linker errors and packages failing to build. It's also loaded in certain circumstances at runtime by glibc, and failure here (e.g. not being available in rpath) results in runtime aborts, possibly with messages likelibgcc_s.so.1 must be installed for pthread_exit to work
.This deficiency in the bootstrap happens to cause problems in a visible way for aarch64-linux and GCC 9->11, but copying
libgcc_s.so
around is unsafe and wrong for all architectures and GCC versions and needs to be fixed. However, it turns out to be a happy accident thatlibgcc_s.so
is always available at runtime for glibc to use, and this needs to be preserved somehow too.Possible Solutions
1. Ignore reason, upgrade bootstrap
Pros: Possible right now, pretty certain to actually fix the problem
Cons: Commits us to upgrade the bootstrap every time
libgcc_s.so
breaks compatibility on any architecture, does not solve the underlying reason. We continue to hope that this will never cause a subtle issue and always break visibly for most packages.2. Remove hack which copies around bootstrap
libgcc_s.so
, add-lgcc_s
to wrapperPros: Tested and seems to work now, relatively certain to actually fix the problem
Cons: Could break in the future if e.g.
dejagnu
is needed in the bootstrap sequence again, adds 7.1 megabytes of GCC's library output to everyone's runtime closure (though this is already the case for C++ programs), doesn't actually improve bootstrapIt might be possible to patch GCC to detect when glibc could need
libgcc_s.so
too (i.e. if pthread support is enabled or exceptions are used?) and then include it only in that case, but that is kind of risky due to the failure mode. Maybelibgcc_s.so
could be split into a separate output to avoid the size penalty.3. Add extra bootstrap stages to glue together a glibc that has the latest GCC's
libgcc_s.so
and a GCC which uses themPros: Should not add much overhead to the bootstrap process
Cons: Would likely require a lot of
patchelf
ing, doesn't actually improve bootstrap4. Add extra bootstrap stages to recompile glibc with the latest GCC (and its
libgcc_s.so
), then possibly GCC with that glibcPros: Solves the issue properly, cleanest and most correct bootstrap approach
Cons: Would complicate the life of people who work on the stdenv as bootstrap would be slower, complex to implement
It might be possible to reduce the overhead of this last solution especially if we need to build another GCC, as GCC already builds itself several times. We might be able to build the first GCC just once and the second GCC fewer times to keep the total number of builds less than double. There is also rumored to be a combined mode that can build GCC and glibc together which might be faster and a shortcut for the first GCC. This must also be careful to preserve correct operation for cross-compilation.
The Path Forward
We have the first solution essentially ready right now so that NixOS 23.05 is not held up, but it's the worst. The last solution is the correct one and needs to be done at some point for the benefit of nixpkgs as a whole. But it's also the most work and might cause problems for contributors if not done carefully.
cc: @K900, @trofi
The text was updated successfully, but these errors were encountered: