Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stdlib code size increase in nightly-2024-09-01 for aarch64-linux-android #130320

Open
jrose-signal opened this issue Sep 13, 2024 · 17 comments
Open
Labels
C-bug Category: This is a bug. O-android Operating system: Android P-medium Medium priority regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@jrose-signal
Copy link

We keep track of our code size on Android to keep our overall download size down. In updating our pinned nightly compiler (from nightly-2024-06-21; we're due for it), our CI turned up a code size increase of ~120KB in a library that's overall about 4.6MB—not huge, but not expected either. Bisecting on Rust toolchains turned up that the regression was introduced in nightly-2024-09-01 (as in, nightly-2024-08-30 does not have this extra code size, and there was no nightly-2024-08-31), and moreover I discovered that using -Zbuild-std recovered the additional code size. So something changed in how the prebuilt standard library is built, in a way that may not have been intended.

Unfortunately, our Android build process does a whole bunch of work at once, so I don't have a minimal example for you. My reproduction has been

  1. Install the Android NDK (27.0.12077973 is the one we're currently using)
  2. Check out https://github.com/signalapp/libsignal/releases/tag/v0.57.1
  3. ANDROID_NDK_HOME=path/to/ndk/27.0.12077973 java/build_jni.sh android-arm64
  4. Save the resulting library, then repeat with RUSTUP_TOOLCHAIN=nightly-2024-09-01.
  5. Compare the libraries' post-stripped sizes, or use a tool like bloaty to measure VM size only

It's possible this change was expected, in which case please close the issue; I understand code sizes go up sometimes!

@rustbot modify labels: +regression-from-stable-to-nightly -regression-untriaged

@jrose-signal jrose-signal added C-bug Category: This is a bug. regression-untriaged Untriaged performance or correctness regression. labels Sep 13, 2024
@rustbot rustbot added I-prioritize Issue: Indicates that prioritization has been requested for this issue. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. and removed regression-untriaged Untriaged performance or correctness regression. labels Sep 13, 2024
@RalfJung
Copy link
Member

Thanks for the report!

Any chance you could try to bisect this further using cargo-bisect-rustc? Since this is recent enough, that should be able to narrow it down to a single PR.

@jrose-signal
Copy link
Author

It took a bit to get it right, and I'm still not sure I got it entirely correctly, but here's the tail end of the output!

% cargo bisect-rustc --start=2024-08-30 --end=2024-09-01 --target aarch64-linux-android --by-commit --script ./test.sh
...
searched toolchains 0d634185dfddefe09047881175f35c65d68dcff1 through a7399ba69d37b019677a9c47fe89ceb8dd82db2d

********************************************************************************
Regression in 9649706eada1b2c68cf6504356efb058f68ad739
********************************************************************************

Attempting to search unrolled perf builds
Found commits ["c8a538a7", "c5f6a66b", "52b51b8a", "970b7b00", "32892549", "8c930915", "16f4069d", "cb679928", "fc177c6c", "d7b7d7de", "d101b3fa", "1b1ad99a", "702679cd", "2ab629b2", "19653419"]
installing c8a538a7032ba3d35c90642bc6c47d985282af8d
uninstalling c8a538a7032ba3d35c90642bc6c47d985282af8d
installing c5f6a66bd472f75e77a7d9ed1ee83454b63d5057
uninstalling c5f6a66bd472f75e77a7d9ed1ee83454b63d5057
installing 52b51b8a3e8716b491fdf82c075c303f79cce65e
uninstalling 52b51b8a3e8716b491fdf82c075c303f79cce65e
installing 970b7b00bc353065c23d1e2d8e22c0a2b8260da7
uninstalling 970b7b00bc353065c23d1e2d8e22c0a2b8260da7
installing 32892549aa22df7399226f3281b94b32cd2ec7c3
uninstalling 32892549aa22df7399226f3281b94b32cd2ec7c3
installing 8c930915fe4ca02780c9ff2fec9612723ca9c203
uninstalling 8c930915fe4ca02780c9ff2fec9612723ca9c203
installing 16f4069d874ebc66e07259afddae546ccb536ca1
uninstalling 16f4069d874ebc66e07259afddae546ccb536ca1
installing cb6799285d407dab26843816f9b98d515eecadc9
uninstalling cb6799285d407dab26843816f9b98d515eecadc9
installing fc177c6c435e97b84e3d7ec9ed19870c4dfbfd3b
uninstalling fc177c6c435e97b84e3d7ec9ed19870c4dfbfd3b
installing d7b7d7deaeb75804926dc0d2529f064a1ef0d1fb
uninstalling d7b7d7deaeb75804926dc0d2529f064a1ef0d1fb
installing d101b3fa61198f4a193e77477e77d95dd9889fec
uninstalling d101b3fa61198f4a193e77477e77d95dd9889fec
installing 1b1ad99aff96a560ae4405daf63996c023f43801
uninstalling 1b1ad99aff96a560ae4405daf63996c023f43801
installing 702679cd9f0236ce8997c93aa01ee8c57d7a30e6
uninstalling 702679cd9f0236ce8997c93aa01ee8c57d7a30e6
installing 2ab629b2fff152427b0f45f5ade6c44cefe50d50
uninstalling 2ab629b2fff152427b0f45f5ade6c44cefe50d50
installing 196534199a7aba5e0b5195084a5bf1ca72cca969
uninstalling 196534199a7aba5e0b5195084a5bf1ca72cca969
ERROR: none of the toolchains satisfied the predicate

Of the commits in 9649706, #129640 is my personal bet…

My detection script could also be off, I used test "$(du "$lib"| cut -f1)" -lt 9300 as the "good" case (the full jump was 9168 du-blocks to 9408).

@workingjubilee
Copy link
Member

@jrose-signal Does this affect the size of any completed binaries you use?

@jrose-signal
Copy link
Author

jrose-signal commented Sep 13, 2024

Yes, the .so being measured here is the aarch64 slice of libsignal, as shipped in our Android app.

@workingjubilee
Copy link
Member

workingjubilee commented Sep 13, 2024

Oh, okay!

@jrose-signal Actually, I think the commit at fault is likely to be #129642 because now Android is getting the backtrace code linked in All The Time.

It's either that, the PR you noticed, or #129366 which changes linkage.

@jrose-signal
Copy link
Author

Ah, backtraces. >< That might explain why -Zbuild-std helps so much, last time I measured the most expensive part of backtrace support is resolving symbol names, which is useless for us because we strip debug info for our release builds.

@workingjubilee workingjubilee added the O-android Operating system: Android label Sep 13, 2024
@workingjubilee
Copy link
Member

Yeah, we really ought to come up with a better way to support decent backtraces than stuffing a DWARF parser in every single binary.

@workingjubilee
Copy link
Member

That or at least make the DWARF parser a lot smaller. Really, I thought they were supposed to be shorter than both Men and Elves, but I've seen an ELF or two in my time that were massively outweighed by their DWARF...

@workingjubilee
Copy link
Member

workingjubilee commented Sep 13, 2024

Size

In the early Old Norse sources, dwarfs are typically described vaguely, with no reference to them being particularly small; in the legendary sagas and later folklore, however, they are often described as short.[3]

Ah, I see. Suppositions about size are actually antimythical? That explains it.

@RalfJung
Copy link
Member

RalfJung commented Sep 14, 2024

Of the commits in 9649706, #129640 is my personal bet…

FWIW the bisect tool can even test the parts of a rollup, but only if you are on a 64bit x86 Linux host (as we only have per-PR artifacts for that target).

But it seems like in this case that will not be needed.

@apiraino
Copy link
Contributor

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-medium

@rustbot rustbot added P-medium Medium priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Sep 16, 2024
@jrose-signal
Copy link
Author

FWIW the bisect tool can even test the parts of a rollup, but only if you are on a 64bit x86 Linux host (as we only have per-PR artifacts for that target).

Oh, whoops. TIL. (Even for Tier 2 targets like Android? Nice.)

@workingjubilee
Copy link
Member

Yeah, as long as you can run the test by cross-compiling.

@jieyouxu jieyouxu added T-libs Relevant to the library team, which will review and decide on the PR/issue. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Oct 19, 2024
@Amanieu
Copy link
Member

Amanieu commented Nov 6, 2024

Could this be related to rust-lang/backtrace-rs#656?

@cuviper
Copy link
Member

cuviper commented Nov 7, 2024

For a default cargo new --lib configured as a dylib, cargo-bloat gives me this:

$ cargo +nightly-2024-08-30 bloat --target aarch64-linux-android --release
[...]
 File  .text     Size     Crate Name
 0.2%   1.4%   7.8KiB       std miniz_oxide::inflate::core::decompress
 0.1%   1.1%   6.2KiB       std gimli::arch::Arm::name_to_register
 0.1%   1.0%   5.5KiB       std gimli::constants::DwAt::static_string
 0.1%   0.9%   5.0KiB       std std::thread::available_parallelism
 0.1%   0.9%   4.7KiB       std gimli::arch::AArch64::name_to_register
 0.1%   0.8%   4.3KiB       std gimli::arch::PowerPc64::name_to_register
 0.1%   0.8%   4.2KiB       std gimli::arch::X86_64::name_to_register
 0.1%   0.7%   4.0KiB       std gimli::arch::LoongArch::name_to_register
 0.1%   0.7%   3.8KiB       std gimli::arch::RiscV::name_to_register
 0.1%   0.6%   3.3KiB       std gimli::arch::MIPS::name_to_register
 0.1%   0.6%   3.2KiB       std core::num::flt2dec::strategy::dragon::format_shortest
 0.1%   0.6%   3.1KiB       std <std_detect::detect::os::aarch64::AtHwcap as core::convert::From<std_detect::detect::os::cpuinfo::CpuInfo>>::from
 0.1%   0.6%   3.1KiB [Unknown] _ZN9libunwind10CFI_ParserINS_17LocalAddressSpaceEE20parseFDEInstructionsERS1_RKNS2_8FDE_InfoERKNS2_8CIE_InfoEmiPNS2_10PrologInfoE
 0.1%   0.5%   2.9KiB       std <core::str::iter::EscapeDebug as core::fmt::Display>::fmt
 0.1%   0.5%   2.9KiB       std gimli::read::abbrev::Abbreviations::insert
 0.1%   0.5%   2.8KiB       std core::num::flt2dec::strategy::dragon::format_exact
 0.1%   0.5%   2.8KiB       std <rustc_demangle::legacy::Demangle as core::fmt::Display>::fmt
 0.1%   0.5%   2.8KiB       std rustc_demangle::demangle
 0.1%   0.5%   2.6KiB       std std::sys::pal::unix::process::process_common::Command::capture_env
 0.1%   0.5%   2.6KiB       std gimli::constants::DwTag::static_string
10.2%  86.0% 477.7KiB           And 2350 smaller methods. Use -n N to show more.
11.9% 100.0% 555.6KiB           .text section size, the file size is 4.6MiB
$ cargo +nightly-2024-09-01 bloat --target aarch64-linux-android --release
[...]
 File  .text     Size     Crate Name
 0.2%   1.5%   9.7KiB       std std::backtrace_rs::symbolize::gimli::resolve
 0.2%   1.4%   9.1KiB       std std::backtrace_rs::symbolize::gimli::Context::new
 0.1%   1.2%   7.8KiB       std miniz_oxide::inflate::core::decompress
 0.1%   1.1%   7.1KiB       std gimli::read::dwarf::Unit<R>::new
 0.1%   0.9%   6.2KiB       std gimli::arch::Arm::name_to_register
 0.1%   0.9%   5.9KiB       std addr2line::ResUnit<R>::find_function_or_location::{{closure}}
 0.1%   0.8%   5.5KiB       std gimli::constants::DwAt::static_string
 0.1%   0.8%   5.0KiB       std addr2line::Lines::parse
 0.1%   0.8%   5.0KiB       std std::thread::available_parallelism
 0.1%   0.7%   4.7KiB       std gimli::arch::AArch64::name_to_register
 0.1%   0.6%   4.3KiB       std gimli::arch::PowerPc64::name_to_register
 0.1%   0.6%   4.2KiB       std gimli::arch::X86_64::name_to_register
 0.1%   0.6%   4.0KiB       std gimli::arch::LoongArch::name_to_register
 0.1%   0.6%   4.0KiB       std addr2line::function::Function<R>::parse_children
 0.1%   0.6%   3.9KiB       std gimli::read::unit::parse_attribute
 0.1%   0.6%   3.8KiB       std gimli::arch::RiscV::name_to_register
 0.1%   0.5%   3.3KiB       std gimli::arch::MIPS::name_to_register
 0.1%   0.5%   3.2KiB       std core::num::flt2dec::strategy::dragon::format_shortest
 0.1%   0.5%   3.1KiB       std <std_detect::detect::os::aarch64::AtHwcap as core::convert::From<std_detect::detect::os::cpuinfo::CpuInfo>>::from
 0.1%   0.5%   3.1KiB [Unknown] _ZN9libunwind10CFI_ParserINS_17LocalAddressSpaceEE20parseFDEInstructionsERS1_RKNS2_8FDE_InfoERKNS2_8CIE_InfoEmiPNS2_10PrologInfoE
 9.7%  84.4% 559.0KiB           And 2451 smaller methods. Use -n N to show more.
11.5% 100.0% 662.3KiB           .text section size, the file size is 5.6MiB

(and current nightly is still similar to the latter)

That might explain why -Zbuild-std helps so much, last time I measured the most expensive part of backtrace support is resolving symbol names, which is useless for us because we strip debug info for our release builds.

So yes, symbolize tops the list, probably adding many of the extra smaller methods as well.

The gimli::arch stuff seems unfortunate in both cases, since we're only self-inspecting, and I'll bet there are associated .data strings for those as well. I wonder if we could get a host-only mode somehow?

@cuviper
Copy link
Member

cuviper commented Nov 7, 2024

Here's a comparison of cargo bloat --crates --split-std:

 File  .text     Size Crate
 3.8%  31.7% 176.3KiB std
 3.1%  26.0% 144.6KiB core
 1.4%  11.7%  65.0KiB gimli
 0.9%   7.8%  43.2KiB compiler_builtins
 0.8%   6.6%  36.7KiB alloc
 0.5%   4.3%  23.7KiB rustc_demangle
 0.4%   3.3%  18.2KiB [Unknown]
 0.3%   2.5%  13.7KiB object
 0.3%   2.2%  12.0KiB memchr
 0.2%   2.0%  11.2KiB miniz_oxide
 0.2%   1.5%   8.2KiB std_detect
 0.0%   0.2%     988B adler
 0.0%   0.2%     872B addr2line
 0.0%   0.0%     272B panic_abort
 0.0%   0.0%     268B hashbrown
 0.0%   0.0%      24B panic_unwind
11.9% 100.0% 555.6KiB .text section size, the file size is 4.6MiB
 File  .text     Size Crate
 3.6%  31.0% 205.6KiB std
 3.0%  25.9% 171.3KiB core
 1.6%  14.3%  94.6KiB gimli
 0.8%   6.5%  43.2KiB compiler_builtins
 0.7%   5.7%  38.0KiB alloc
 0.4%   3.6%  23.7KiB rustc_demangle
 0.4%   3.1%  20.7KiB addr2line
 0.3%   2.7%  18.2KiB [Unknown]
 0.2%   2.1%  13.7KiB object
 0.2%   1.8%  12.0KiB memchr
 0.2%   1.7%  11.2KiB miniz_oxide
 0.1%   1.2%   8.2KiB std_detect
 0.0%   0.1%     988B adler
 0.0%   0.0%     272B panic_abort
 0.0%   0.0%     268B hashbrown
 0.0%   0.0%      24B panic_unwind
11.5% 100.0% 662.3KiB .text section size, the file size is 5.6MiB

There's significant growth in std, core, gimli, and addr2line, and a little bit in alloc.

@cuviper
Copy link
Member

cuviper commented Nov 7, 2024

Below are numbers for the same library on aarch64-linux-unknown-gnu, which are pretty much the same as the "new" Android, but it was already like that in nightly-2024-08-30 too. So I'd say Android is just catching up, and IMO we don't need to worry about this as a regression -- although general size reductions would still be nice!

 File  .text     Size Crate Name
 0.2%   1.6%  10.6KiB   std std::backtrace_rs::symbolize::gimli::resolve
 0.2%   1.4%   9.1KiB   std std::backtrace_rs::symbolize::gimli::Context::new
 0.1%   1.2%   7.8KiB   std miniz_oxide::inflate::core::decompress
 0.1%   1.1%   7.2KiB   std gimli::read::dwarf::Unit<R>::new
 0.1%   0.9%   6.2KiB   std gimli::arch::Arm::name_to_register
 0.1%   0.9%   5.9KiB   std addr2line::ResUnit<R>::find_function_or_location::{{closure}}
 0.1%   0.8%   5.5KiB   std gimli::constants::DwAt::static_string
 0.1%   0.8%   5.1KiB   std std::thread::available_parallelism
 0.1%   0.8%   5.0KiB   std addr2line::Lines::parse
 0.1%   0.7%   4.7KiB   std gimli::arch::AArch64::name_to_register
 0.1%   0.6%   4.3KiB   std gimli::arch::PowerPc64::name_to_register
 0.1%   0.6%   4.2KiB   std gimli::arch::X86_64::name_to_register
 0.1%   0.6%   4.0KiB   std gimli::arch::LoongArch::name_to_register
 0.1%   0.6%   4.0KiB   std addr2line::function::Function<R>::parse_children
 0.1%   0.6%   3.9KiB   std gimli::read::unit::parse_attribute
 0.1%   0.6%   3.8KiB   std gimli::arch::RiscV::name_to_register
 0.1%   0.5%   3.5KiB   std std::sys::pal::unix::process::process_inner::<impl std::sys::pal::unix::process::process_common::Command>::spawn
 0.1%   0.5%   3.3KiB   std gimli::arch::MIPS::name_to_register
 0.1%   0.5%   3.2KiB   std core::num::flt2dec::strategy::dragon::format_shortest
 0.1%   0.5%   3.1KiB   std <std_detect::detect::os::aarch64::AtHwcap as core::convert::From<std_detect::detect::os::cpuinfo::CpuInfo>>::from
 9.6%  83.6% 557.0KiB       And 2603 smaller methods. Use -n N to show more.
11.5% 100.0% 666.1KiB       .text section size, the file size is 5.6MiB
 File  .text     Size Crate
 3.8%  32.6% 217.0KiB std
 3.0%  25.6% 170.7KiB core
 1.6%  14.2%  94.8KiB gimli
 0.9%   7.4%  49.2KiB compiler_builtins
 0.7%   5.7%  37.9KiB alloc
 0.4%   3.6%  23.7KiB rustc_demangle
 0.4%   3.1%  20.7KiB addr2line
 0.2%   2.1%  13.7KiB object
 0.2%   1.8%  12.0KiB memchr
 0.2%   1.7%  11.2KiB miniz_oxide
 0.1%   1.2%   7.9KiB std_detect
 0.0%   0.2%   1.0KiB [Unknown]
 0.0%   0.1%     988B adler
 0.0%   0.0%     272B panic_abort
 0.0%   0.0%     268B hashbrown
 0.0%   0.0%      24B panic_unwind
11.5% 100.0% 666.1KiB .text section size, the file size is 5.6MiB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. O-android Operating system: Android P-medium Medium priority regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants