Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary size regression on msp430 when using Cell with types from external crates. #67819

Closed
cr1901 opened this issue Jan 3, 2020 · 4 comments
Labels
C-bug Category: This is a bug. I-heavy Issue: Problems and improvements with respect to binary size of generated code. O-msp430 T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-embedded Working group: Embedded systems

Comments

@cr1901
Copy link
Contributor

cr1901 commented Jan 3, 2020

Regression found in the compiler

searched nightlies: from nightly-2019-03-27 to nightly-2019-09-04
regressed nightly: nightly-2019-07-27
searched commits: from 890881f to c43753f
regressed commit: c43753f
source code: https://github.com/cr1901/msp430-size

Instructions

  1. Make sure xargo, cargo-bisect-rustc, and optionally just are installed.
  2. git clone https://github.com/cr1901/msp430-size. Use commit c32d01b specifically.
  3. Invoke just bisect, or alternatively cargo-bisect-rustc --script=./test-regress.sh --preserve-target --preserve --start=2019-03-27 --end=2019-09-04 --with-src --with-cargo. The bisector will use test-regress.sh to look for size differences between two configurations of the min.rs example. Contents of test-regress.sh are reproduced below for convenience.

Error

The example min.rs in msp430-size can be built in two configurations:

  • One which uses an external crate called bare_metal to provide synchronization primitives.
  • The other which implements bare_metal inline in the min.rs example.

Because LTO is enabled, I would expect both these configurations to optimize down to the same final binary size. However, using an extern bare_metal crate, especially the CriticalSection type,
causes RefCell trait objects to be left behind for core::result::unwrap_failed().

Because the panic handler is an infinite loop, the formatting code left behind is in fact not used, and I believe the formatting code and RefCell panic strings should be optimized away as dead code. Because msp430 is size-sensitive, the dead code is a bigger deal than compared to most architectures, sometimes doubling small binary sizes! See the end of COLLAPSIBLE REGRESSION EXAMPLE.

COLLAPSIBLE REGRESSION EXAMPLE

william@xubuntu-dtrain:~/Projects/msp430-size$ rustup toolchain list
nightly-2019-03-27-x86_64-unknown-linux-gnu
nightly-2019-09-04-x86_64-unknown-linux-gnu
nightly-x86_64-unknown-linux-gnu (default)
bisector-nightly-2019-03-27-x86_64-unknown-linux-gnu
bisector-nightly-2019-06-15-x86_64-unknown-linux-gnu
bisector-nightly-2019-07-25-x86_64-unknown-linux-gnu
bisector-nightly-2019-07-26-x86_64-unknown-linux-gnu
bisector-nightly-2019-07-27-x86_64-unknown-linux-gnu
bisector-nightly-2019-07-29-x86_64-unknown-linux-gnu
bisector-nightly-2019-08-03-x86_64-unknown-linux-gnu
bisector-nightly-2019-08-13-x86_64-unknown-linux-gnu
bisector-nightly-2019-08-15-x86_64-unknown-linux-gnu
bisector-nightly-2019-09-04-x86_64-unknown-linux-gnu
ci-1a563362865e6051d4c350544131228e8eff5138-x86_64-unknown-linux-gnu
ci-4268e7ee22935f086b856ef0063a9e22b49aeddb-x86_64-unknown-linux-gnu
ci-890881f8f4c77e8670d4b32104c0325fcfefc90f-x86_64-unknown-linux-gnu
ci-c43753f910aae000f8bcb0a502407ea332afc74b-x86_64-unknown-linux-gnu
msp430-fix
msp430-test
william@xubuntu-dtrain:~/Projects/msp430-size$ rustup override set ci-c43753f910aae000f8bcb0a502407ea332afc74b-x86_64-unknown-linux-gnu
info: override toolchain for '/home/william/Projects/msp430-size' set to 'ci-c43753f910aae000f8bcb0a502407ea332afc74b-x86_64-unknown-linux-gnu'
william@xubuntu-dtrain:~/Projects/msp430-size$ xargo clean
william@xubuntu-dtrain:~/Projects/msp430-size$ xargo build --manifest-path=take-api/Cargo.toml --release --target=msp430-none-elf --example min --features use-bare-metal
   Compiling semver-parser v0.7.0
   Compiling autocfg v0.1.7
   Compiling proc-macro2 v0.4.30
   Compiling unicode-xid v0.1.0
   Compiling libc v0.2.66
   Compiling rand_core v0.4.2
   Compiling syn v0.15.44
   Compiling msp430-rt v0.2.0
   Compiling msp430g2553 v0.2.0 (https://github.com/cr1901/msp430g2553?branch=rt-up#345b518a)
   Compiling r0 v0.2.2
   Compiling vcell v0.1.2
   Compiling rand_core v0.3.1
   Compiling rand_jitter v0.1.4
   Compiling rand_xorshift v0.1.1
   Compiling rand_hc v0.1.0
   Compiling rand_isaac v0.1.1
   Compiling semver v0.9.0
   Compiling rand_chacha v0.1.1
   Compiling rand_pcg v0.1.2
   Compiling rand v0.6.5
   Compiling rustc_version v0.2.3
   Compiling bare-metal v0.2.5
   Compiling msp430 v0.2.0
   Compiling panic-msp430 v0.2.0
   Compiling quote v0.6.13
   Compiling msp430-rt-macros v0.2.0
   Compiling take-api v0.1.0 (/home/william/Projects/msp430-size/take-api)
    Finished release [optimized] target(s) in 16.89s
william@xubuntu-dtrain:~/Projects/msp430-size$ xargo build --manifest-path=take-api/Cargo.toml --release --target=msp430-none-elf --example min
   Compiling msp430-rt v0.2.0
   Compiling msp430g2553 v0.2.0 (https://github.com/cr1901/msp430g2553?branch=rt-up#345b518a)
   Compiling take-api v0.1.0 (/home/william/Projects/msp430-size/take-api)
    Finished release [optimized] target(s) in 1.99s
william@xubuntu-dtrain:~/Projects/msp430-size$ msp430-elf-size -A target/msp430-none-elf/release/examples/min-fe04a12c0ec047ac
target/msp430-none-elf/release/examples/min-fe04a12c0ec047ac  :
section              size    addr
.vector_table          32   65504
.text                 234   49152
.rodata                42   49386
.bss                    4     512
.data                   0     516
.MSP430.attributes     23       0
Total                 335


william@xubuntu-dtrain:~/Projects/msp430-size$ msp430-elf-size -A target/msp430-none-elf/release/examples/min-ff1850411f698246
target/msp430-none-elf/release/examples/min-ff1850411f698246  :
section              size    addr
.vector_table          32   65504
.text                 100   49152
.rodata                 0   49252
.bss                    4     512
.data                   0     516
.MSP430.attributes     23       0
Total                 159


william@xubuntu-dtrain:~/Projects/msp430-size$

Other Context

This bug is still present as of rustc 1.42.0-nightly (1ed41b072 2020-01-01).

Inline contents of test-regress.sh:

The test-regress.sh script compiles the min.rs example in both possible configurations- inline bare_metal and external bare_metal- and compares the output binary sizes. A difference in binary size between the two configurations is considered a failure.

#!/bin/sh
NOBM_FILE=`mktemp`
BM_FILE=`mktemp`

xargo build --manifest-path=take-api/Cargo.toml --release --target=msp430-none-elf --example min --features use-bare-metal
msp430-elf-size -A $CARGO_TARGET_DIR/msp430-none-elf/release/examples/min > $BM_FILE

xargo build --manifest-path=take-api/Cargo.toml --release --target=msp430-none-elf --example min
msp430-elf-size -A $CARGO_TARGET_DIR/msp430-none-elf/release/examples/min > $NOBM_FILE

diff -q $NOBM_FILE $BM_FILE
RV=$?

rm $NOBM_FILE $BM_FILE

exit $RV
@jonas-schievink jonas-schievink added C-bug Category: This is a bug. I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-embedded Working group: Embedded systems labels Jan 3, 2020
@cr1901
Copy link
Contributor Author

cr1901 commented Jun 5, 2021

While diagnosing another issue, I came back to this one. The script I used to bisect was buggy; the actual commit where dead code stopped being optimized away was 955979a.

As of recent (well as of me writing this) commit dbe459d, I can get the dead code elimination to work again using the following patch:

diff --git a/library/core/src/result.rs b/library/core/src/result.rs
index babd0a0b552..8207ec512c9 100644
--- a/library/core/src/result.rs
+++ b/library/core/src/result.rs
@@ -994,7 +994,7 @@ impl<T, E: fmt::Debug> Result<T, E> {
     pub fn expect(self, msg: &str) -> T {
         match self {
             Ok(t) => t,
-            Err(e) => unwrap_failed(msg, &e),
+            Err(e) => unwrap_failed(msg, e),
         }
     }
 
@@ -1034,7 +1034,7 @@ pub fn expect(self, msg: &str) -> T {
     pub fn unwrap(self) -> T {
         match self {
             Ok(t) => t,
-            Err(e) => unwrap_failed("called `Result::unwrap()` on an `Err` value", &e),
+            Err(e) => unwrap_failed("called `Result::unwrap()` on an `Err` value", e),
         }
     }
 }
@@ -1061,7 +1061,7 @@ impl<T: fmt::Debug, E> Result<T, E> {
     #[stable(feature = "result_expect_err", since = "1.17.0")]
     pub fn expect_err(self, msg: &str) -> E {
         match self {
-            Ok(t) => unwrap_failed(msg, &t),
+            Ok(t) => unwrap_failed(msg, t),
             Err(e) => e,
         }
     }
@@ -1089,7 +1089,7 @@ pub fn expect_err(self, msg: &str) -> E {
     #[stable(feature = "rust1", since = "1.0.0")]
     pub fn unwrap_err(self) -> E {
         match self {
-            Ok(t) => unwrap_failed("called `Result::unwrap_err()` on an `Ok` value", &t),
+            Ok(t) => unwrap_failed("called `Result::unwrap_err()` on an `Ok` value", t),
             Err(e) => e,
         }
     }
@@ -1351,7 +1351,7 @@ pub const fn into_ok_or_err(self) -> T {
 #[inline(never)]
 #[cold]
 #[track_caller]
-fn unwrap_failed(msg: &str, error: &dyn fmt::Debug) -> ! {
+fn unwrap_failed<E: fmt::Debug>(msg: &str, error: E) -> ! {
     panic!("{}: {:?}", msg, error)
 }

This isn't a public interface, so I think the patch can be used as-is. However, I imagine there was a reason that unrwap_failed began accepting trait objects that's a little more important than my use case to remove bloat/dead code from small applications.

I wonder if it's possible for Rust to optimize the trait objects out anyway if LTO is enabled? cc: @cuviper if you have the chance?

@cuviper
Copy link
Member

cuviper commented Jun 7, 2021

Hmm, that commit was an ad-hoc request in #62596 (comment) (cc @sfackler). Ironically, the idea was to reduce bloat, but I don't think we ever measure that. I'm guessing in your case that it's forcing the compiler to keep some full Debug vtable implementations in cases that might otherwise be optimized unreachable. (Which is basically the same as what you're saying about RefCell trait objects.) You could try cargo bloat to put real numbers on that, if it will work with your target.

The main point of #62596 was later removed in #82759, but I don't feel any attachment to the dyn Debug change. If it has measurable harm, we should remove it too.

@cr1901
Copy link
Contributor Author

cr1901 commented Jun 7, 2021

This isn't the only instance of "formatting code refuses to be optimized away"; this is just one that I noticed at that time that forced me to rewrite some firmware to avoid calls to unwrap_failed.

unwrap may be discouraged in a lot of code, but I feel it's a reasonably size efficient way to panic (in an infinite loop) in deeply embedded code where there's no real hope of recovery if assumptions are violated.

If it has measurable harm, we should remove it too.

Before dyn Debug gets removed, I'd rather understand why the trait objects are not getting optimized out in the first place; there's nowhere to echo the strings, so I don't understand why the compiler and/or LLVM doesn't treat dyn Debug trait objects as dead code. Although size is more critical on MSP430, the problem of formatting code not being optimized away is also felt on embedded ARM. It seems formatting just inherently takes a nice chunk of .text for itself :/.

@cr1901
Copy link
Contributor Author

cr1901 commented Sep 3, 2022

I'd rather understand why the trait objects are not getting optimized out in the first place.

After patching/accounting for inline asm syntax changes locally and bisecting, my issue as described is fixed by 599a03c.

This is an LLVM bump, so out of curiosity I bisected to find the relevant patch by @nikic which fixes the bloat (I can't find LLVM commit 4796b4a on Github). Even with sccache this took several hours- my build machine is probably not happy :P.

Closing as resolved/not Rust's problem. But if the issue comes back, I have a hint of where to look.

@cr1901 cr1901 closed this as completed Sep 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. I-heavy Issue: Problems and improvements with respect to binary size of generated code. O-msp430 T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-embedded Working group: Embedded systems
Projects
None yet
Development

No branches or pull requests

4 participants