Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The binary size - performance tradeoff #11

Closed
japaric opened this issue Aug 10, 2018 · 13 comments
Closed

The binary size - performance tradeoff #11

japaric opened this issue Aug 10, 2018 · 13 comments
Labels

Comments

@japaric
Copy link
Member

japaric commented Aug 10, 2018

From @japaric on March 22, 2018 12:53

As of the latest LLVM upgrade (4.0 -> 6.0 on 2018-02-11) LLVM seems to now perform loop unrolling more agressively; this increased the binary size of a minimal program that only zeroes .bss and initializes .data from 130 bytes .text (nightly-2018-02-10) to 1114 bytes (nightly-2018-03-20) when using opt-level=3 + LTO -- FWIW, I highly doubt the loop unrolling actually improves performance at all. Original report: rust-lang/rust#49260

This put us in a bad spot because by default we'll end with large optimized (--release) binaries -- I can already foresee future comparisons between C and Rust pointing out that the smallest embedded C program is only a hundred bytes in size whereas the smallest embedded Rust program is 1 KB.

So we should make sure we clearly document why Rust programs are so large by default and how to make Rust programs small. Using opt-level=s + LTO on the minimal program mentioned above brings the size back to 130 bytes .text.

cc @jamesmunns ^ that should be included in the book

There are other possibilites to explore here: like having something like C's / clang's #pragma nounroll to prevent LLVM from optimizing loops marked with that attribute, but I doubt we'll get any of that into the 2018 edition release -- it's too late, I think.

Copied from original issue: rust-embedded/wg#69

@japaric japaric added the docs label Aug 10, 2018
@japaric
Copy link
Member Author

japaric commented Aug 10, 2018

From @therealprof on March 23, 2018 14:41

FWIW: I think funky stuff like unrolling (which is really worthless on embedded architectures) is to be expected to be done at higher optimisation levels and I've seen all kind of funky size regressions. I'm always using (and recommending) -s. Potentially -z could also be tried, but -O3 is a big nono...

@japaric
Copy link
Member Author

japaric commented Aug 10, 2018

From @whitequark on March 23, 2018 14:44

-O3 is pretty much defined as "-O2 with optimizations that cause code bloat", so I'm not sure why you'd go higher than -O2 on embedded devices.

@japaric
Copy link
Member Author

japaric commented Aug 10, 2018

From @therealprof on March 23, 2018 15:17

the smallest embedded C program is only a hundred bytes in size whereas the smallest embedded Rust program is 1 KB.

NB: I highly doubt that. As soon as one uses some the initialisation code from some of the typical SDKs, the code will be well in the kBs already. To even stay in Rusts range you'll have to manually bang the memory mapped registers and write your own linker scripts.

Case in point, this is the smallest possible binary for a main { while(1) {} } loop for the STM32F051 I could achieve based on STM32Cube initialisation:

# arm-none-eabi-size .pioenvs/disco_f051r8/firmware.elf
   text	   data	    bss	    dec	    hex	filename
    892	   1080	   1600	   3572	    df4	.pioenvs/disco_f051r8/firmware.elf

@japaric
Copy link
Member Author

japaric commented Aug 10, 2018

From @Emilgardis on March 23, 2018 15:19

Could we get a RFC for something like #[no_unroll]/#[unroll(disable)]?

@japaric
Copy link
Member Author

japaric commented Aug 10, 2018

From @whitequark on March 23, 2018 15:20

@Emilgardis No need for an RFC, marking the function with the loop as #[cold] should suffice.

@japaric
Copy link
Member Author

japaric commented Aug 10, 2018

From @jonas-schievink on March 23, 2018 15:22

@japaric also wrote in rust-lang/rust#49260:

My experience with opt-level={s,z}, at least when LLVM 4 was around, is that they produce bigger binaries than opt-level=3

If this is still the case with LLVM 6, this definitely wants to be investigated and fixed on the LLVM side.

He also wrote:

iirc, opt-level={s,z} also reduces the iniling threshold which prevents LLVM from optimizing dead branches when using RTFM's claim mechanism.

This might be the cause for some amount of bloat due to unnecessary branches, but shouldn't #[inline] be a strong enough hint to LLVM to still inline the function?

@japaric
Copy link
Member Author

japaric commented Aug 10, 2018

From @therealprof on March 23, 2018 15:23

Why not get the default flags changed instead? It'd be very annoying to put annotations in every source file just in case someone might accidentally not change the compiler flags...

@japaric
Copy link
Member Author

japaric commented Aug 10, 2018

From @whitequark on March 23, 2018 15:24

This might be the cause for some amount of bloat due to unnecessary branches, but shouldn't #[inline] be a strong enough hint to LLVM to still inline the function?

The inlining thresholds in LLVM are tailored for C, which produces functions with relatively compact IR, and likely aren't well suited for Rust. In our in-house language we had to raise them significantly to get decent reductions in code size.

@japaric
Copy link
Member Author

japaric commented Aug 10, 2018

From @Emilgardis on March 23, 2018 15:26

@whitequark I've never heard of that attribute, seems like it should work however.

@japaric
Copy link
Member Author

japaric commented Aug 10, 2018

From @therealprof on March 23, 2018 15:28

@jonas-schievink I can not confirm that it produces larger files with opt-level=s, at least not in general. This all has quite a bit of premature optimisation smell to it, same as with the #[inline(always)] we had sprinkled all over the map...

@japaric
Copy link
Member Author

japaric commented Aug 10, 2018

From @durka on March 23, 2018 16:23

There has been a small amount of discussion about unrolling atttributes:
rust-lang/rfcs#2219

On Fri, Mar 23, 2018 at 11:29 AM, Daniel Egger notifications@github.com
wrote:

@jonas-schievink https://github.com/jonas-schievink I can not confirm
that it produces larger files with opt-level=s, at least not in general.
This all has quite a bit of premature optimisation smell to it, same as
with the #[inline(always)] we had sprinkled all over the map...


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
rust-embedded/wg#69 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAC3ny1RMzakWcAlwf4TUD8Up6h0FM_1ks5thRS9gaJpZM4S2_bO
.

@japaric
Copy link
Member Author

japaric commented Aug 10, 2018

From @RandomInsano on March 24, 2018 13:42

The choice to unroll or not should be at the LLVM layer. With the Linux HAL I could very well want loop unrolling if my x86_64 machine was using a driver over the SMBus.

Is it possible for the LLVM backends to automatically opt-out when appropriate?

On Mar 23, 2018, at 11:23 AM, Alex Burka notifications@github.com wrote:

There has been a small amount of discussion about unrolling atttributes:
rust-lang/rfcs#2219

On Fri, Mar 23, 2018 at 11:29 AM, Daniel Egger notifications@github.com
wrote:

@jonas-schievink https://github.com/jonas-schievink I can not confirm
that it produces larger files with opt-level=s, at least not in general.
This all has quite a bit of premature optimisation smell to it, same as
with the #[inline(always)] we had sprinkled all over the map...


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
rust-embedded/wg#69 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAC3ny1RMzakWcAlwf4TUD8Up6h0FM_1ks5thRS9gaJpZM4S2_bO
.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

japaric added a commit that referenced this issue Sep 18, 2018
bors bot added a commit that referenced this issue Sep 28, 2018
26: Optimizations: the speed size tradeoff r=therealprof a=japaric

cc #11

another unsorted topic

r? @rust-embedded/resources

Co-authored-by: Jorge Aparicio <jorge@japaric.io>
@japaric
Copy link
Member Author

japaric commented Nov 6, 2018

This was documented in #26

@japaric japaric closed this as completed Nov 6, 2018
njmartin10 pushed a commit to njmartin10/book that referenced this issue Nov 10, 2018
njmartin10 pushed a commit to njmartin10/book that referenced this issue Nov 10, 2018
26: Optimizations: the speed size tradeoff r=therealprof a=japaric

cc rust-embedded#11

another unsorted topic

r? @rust-embedded/resources

Co-authored-by: Jorge Aparicio <jorge@japaric.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant