Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize file size of release binaries #571

Closed
ace-dent opened this issue Nov 2, 2023 · 11 comments · Fixed by #592
Closed

Optimize file size of release binaries #571

ace-dent opened this issue Nov 2, 2023 · 11 comments · Fixed by #592
Labels
E-Easy Issues that someone new to the project could easily pick up OS-Windows

Comments

@ace-dent
Copy link

ace-dent commented Nov 2, 2023

Feature Request / Minor –

The w32 executable binary is ~1,385 KB. While this seems small by contemporary standards, it is far larger than equivalent programs. e.g. optipng is ~ 209 KB (~15% the size).

This is relevant when working with certain environments / VM's, as it's convenient to package multiple tools in a 'floppy' image <1.44MB.

UPX compression reduces the binary by a third(!) to ~462KB but should be avoided due to false positives with anti-virus software.
Standard zip compression also achieves a similar result ~610KB. This suggests there is a reasonable amount of redundancy in the binary.

@AlexTMjugador - is it possible to look at compiler optimizations that don't (signficantly) affect speed but produce a more compact executable? Perhaps there is extra fluff that is unneeded for Windows?

@ace-dent
Copy link
Author

ace-dent commented Nov 2, 2023

BTW, looking through the binary with strings yielded these apparent embedded compile errors(?):

CLICOLOR_FORCETERMCLICOLORNO_COLORCIcalled `Option::unwrap()` on a `None` valueC:\Users\runneradmin\.cargo\registry\src\index.crates.io-6f17d22bba15001f\anstream-0.3.2\src\adapter\wincon.rs

assertion failed: mid <= self.len()C:\Users\runneradmin\.cargo\registry\src\index.crates.io-6f17d22bba15001f\anstream-0.3.2\src\adapter\strip.rs

C:\Users\runneradmin\.cargo\registry\src\index.crates.io-6f17d22bba15001f\anstream-0.3.2\src\strip.rs

and

[58;2;C:\Users\runneradmin\.cargo\registry\src\index.crates.io-6f17d22bba15001f\anstyle-1.0.1\src\color.rs
C:\Users\runneradmin\.cargo\registry\src\index.crates.io-6f17d22bba15001f\anstyle-1.0.1\src\effect.rsBOLD
[1mDIMMED
[2mITALIC
[3mUNDERLINE
[4mDOUBLE_UNDERLINE
[21mCURLY_UNDERLINE
[4:3mDOTTED_UNDERLINE
[4:4mDASHED_UNDERLINE
[4:5mBLINK
[5mINVERT
[7mHIDDEN
[8mSTRIKETHROUGH
C:\Users\runneradmin\.cargo\registry\src\index.crates.io-6f17d22bba15001f\anstyle-parse-0.2.1\src\params.rs
/rustc/bf9a1c8a193fc373897196321215794c8bebbeec\library\alloc\src\slice.rs
C:\Users\runneradmin\.cargo\registry\src\index.crates.io-6f17d22bba15001f\clap_builder-4.3.8\src\output\textwrap\word_separators.rs
ffffff
a Display implementation returned an error unexpectedly/rustc/bf9a1c8a193fc373897196321215794c8bebbeec\library\alloc\src\string.rs
a Display implementation returned an error unexpectedly/rustc/bf9a1c8a193fc373897196321215794c8bebbeec\library\alloc\src\string.rs
internal error: entered unreachable codeC:\Users\runneradmin\.cargo\registry\src\index.crates.io-6f17d22bba15001f\clap_builder-4.3.8\src\parser\parser.rs
internal error: entered unreachable code: `to_long` always has the flag specified
Fatal internal error. Please consider filing a bug report at https://github.com/clap-rs/clap/issues
called `Option::unwrap()` on a `None` value
C:\Users\runneradmin\.cargo\registry\src\index.crates.io-6f17d22bba15001f\clap_builder-4.3.8\src\builder\arg.rs
truefalse
a Display implementation returned an error unexpectedly/rustc/bf9a1c8a193fc373897196321215794c8bebbeec\library\alloc\src\string.rs
called `Result::unwrap()` on an `Err` valueC:\Users\runneradmin\.cargo\registry\src\index.crates.io-6f17d22bba15001f\clap_builder-4.3.8\src\builder\command.rs
 |--
-helpPrint help (see more with '--help')Print help (see a summary with '-h')versionPrint this message or the help of the given subcommand(s)subcommandCOMMAND
|Fatal internal error. Please consider filing a bug report at https://github.com/clap-rs/clap/issues
about-with-newline
hor-with-newlineauthor-with-newl        
a Display implementation returned an error unexpectedly/rustc/bf9a1c8a193fc373897196321215794c8bebbeec\library\alloc\src\string.rs
/rustc/bf9a1c8a193fc373897196321215794c8bebbeec\library\core\src\str\pattern.rs
{before-help}{about-with-newline}
{usage-heading} {usage}{after-help}{before-help}{about-with-newline}
{usage-heading} {usage}
{all-args}{after-help}author-sectionabout-sectionusage-headingoptionspositionalssubcommandstabafter-helpbefore-help{}
  Usage:l'R
{n}-
Commands:l'R
ArgumentsOptionsl'R
, --l'R
 Only called with possible valueC:\Users\runneradmin\.cargo\registry\src\index.crates.io-6f17d22bba15001f\clap_builder-4.3.8\src\output\help_template.rs
[default: ]
[aliases: 
[short aliases: 
[possible values: 
COLUMNSLINESfalsetrue0
Fatal internal error. Please consider filing a bug report at https://github.com/clap-rs/clap/issuesC:\Users\runneradmin\.cargo\registry\src\index.crates.io-6f17d22bba15001f\clap_builder-4.3.8\src\parser\arg_matcher.rs
C:\Users\runneradmin\.cargo\registry\src\index.crates.io-6f17d22bba15001f\clap_builder-4.3.8\src\util\flat_map.rs
called `Option::unwrap()` on a `None` value
C:\Users\runneradmin\.cargo\registry\src\index.crates.io-6f17d22bba15001f\clap_builder-4.3.8\src\util\graph.rs
Index out of bounds
/rustc/bf9a1c8a193fc373897196321215794c8bebbeec\library\core\src\slice\sort.rs
called `Option::unwrap()` on a `None` value
assertion failed: end >= start && end <= len
assertion failed: offset != 0 && offset <= len
called `Option::unwrap()` on a `None` valueC:\Users\runneradmin\.cargo\registry\src\index.crates.io-6f17d22bba15001f\anstyle-wincon-1.0.1\src\console.rs30R
failed to write whole buffer
formatter error
/rustc/bf9a1c8a193fc373897196321215794c8bebbeec\library\std\src\io\mod.rs
advancing io slices beyond their length
advancing IoSlice beyond its length
/rustc/bf9a1c8a193fc373897196321215794c8bebbeec\library\std\src\sys\windows\io.rs
/rustc/bf9a1c8a193fc373897196321215794c8bebbeec\library\std\src\io\stdio.rs
C:\Users\runneradmin\.cargo\registry\src\index.crates.io-6f17d22bba15001f\clap_builder-4.3.8\src\error\mod.rs
to pass '' as a value, use ' -- '
 --' exists
subcommand '' exists; to use it, remove the '' before it
TERM
formatter error
/rustc/bf9a1c8a193fc373897196321215794c8bebbeec\library\std\src\io\stdio.rs
 Fatal internal error. Please consider filing a bug report at https://github.com/clap-rs/clap/issuesC:\Users\runneradmin\.cargo\registry\src\index.crates.io-6f17d22bba15001f\clap_builder-4.3.8\src\builder\arg.rs--
 [=[=
called `Option::unwrap()` on a `None` value[
C:\Users\runneradmin\.cargo\registry\src\index.crates.io-6f17d22bba15001f\clap_builder-4.3.8\src\builder\ext.rs

there are lots more errors too... also seems to be way too much build log text embedded in a static binary... perhaps 50-100KB unnecessarily...

strings.txt

@ace-dent
Copy link
Author

ace-dent commented Nov 2, 2023

This resource may be useful?
https://github.com/johnthagen/min-sized-rust

@TPS
Copy link

TPS commented Nov 4, 2023

Is https://github.com/google/bloaty useful to analyse this?

@AlexTMjugador
Copy link
Collaborator

AlexTMjugador commented Nov 5, 2023

@AlexTMjugador - is it possible to look at compiler optimizations that don't (signficantly) affect speed but produce a more compact executable? Perhaps there is extra fluff that is unneeded for Windows?

Yeah, the Rust compiler by default adds some mostly unnecessary fluff to the release binaries it produces.

Not generating PIE is a little-known trick I've seen greatly reduce the size of ELF (Linux, MacOS) executables without compromising runtime properties other than potentially making ASLR harder. That said, the min-sized-rust resource you pointed to is a pretty good summary of what can be done to reduce binary sizes.

In particular, as you saw with strings, the panic unwielding and message formatting infrastructure takes up a lot of space, and getting rid of it should have a significant effect, at the cost of worse diagnostic output when OxiPNG panics (which should not happen anyway, since panics signal implementation errors).

Also, I'd try to avoid using UPX on a project like OxiPNG because, in addition to the reasons you mentioned, the runtime decompression it requires prevents the operating system from sharing executable code memory pages among multiple OxiPNG processes, which is less efficient overall in cases where users are running OxiPNG concurrently. It also requires a small amount of time and in-memory scratch space for initial decompression, which has a negative impact on latency.

@AlexTMjugador AlexTMjugador added E-Easy Issues that someone new to the project could easily pick up OS-Windows labels Nov 5, 2023
@ace-dent
Copy link
Author

Thanks @AlexTMjugador. Will you have time to experiment with this?
It seems it might benefit all OS platforms.

@andrews05
Copy link
Collaborator

andrews05 commented Nov 14, 2023

I followed that link you provided and ran some experiments on macOS x86_64, including execution times for some:

1512288 default                                           14.37s
1267360 opt-level=z                                       20.98s
1245656 opt-level=s                                       15.93s
1027512 opt-level=s lto=fat                               15.41s
1027272 opt-level=s lto=fat codegen-units=1
 944816 opt-level=s lto=fat codegen-units=1 panic=abort   15.40s
 911808 opt-level=s lto=fat codegen-units=1 panic=abort location-detail=none
 845264 opt-level=s lto=fat codegen-units=1 panic=abort location-detail=none build-std
 663592 opt-level=s lto=fat codegen-units=1 panic=abort location-detail=none build-std panic_immediate_abort
 547520 opt-level=s lto=fat codegen-units=1 panic=abort location-detail=none build-std panic_immediate_abort features=binary

Note the last line is without zopfli, parallel, or filetime features.

We definitely wouldn't want to change the default opt-level as that has a noticeable speed impact. I would be reluctant to change lto as well, as this was explicitly set to thin to improve compile time, but we could consider it. Panic we could change, though that has a smaller size impact.

Output from cargo-bloat:

 File  .text     Size Crate
21.3%  34.9% 423.6KiB std
14.0%  22.9% 278.1KiB clap_builder
11.2%  18.3% 222.3KiB oxipng
 4.8%   7.9%  95.6KiB [Unknown]
 3.1%   5.1%  61.3KiB zopfli
 2.1%   3.4%  41.0KiB rayon_core
 1.4%   2.3%  27.4KiB env_logger
 0.9%   1.4%  17.0KiB hashbrown
 0.8%   1.3%  16.2KiB crossbeam_channel
 0.5%   0.9%  10.5KiB crossbeam_epoch
 0.2%   0.3%   3.9KiB anstream
 0.2%   0.3%   3.8KiB crossbeam_deque
 0.2%   0.3%   3.7KiB indexmap
 0.2%   0.3%   3.7KiB libdeflate_sys
 0.2%   0.3%   3.3KiB clap_lex
 0.1%   0.2%   2.7KiB strsim
 0.1%   0.2%   2.4KiB rayon
 0.1%   0.2%   2.2KiB filetime
 0.1%   0.2%   2.0KiB anstyle
 0.1%   0.2%   2.0KiB typed_arena
 0.3%   0.4%   5.4KiB And 7 more crates. Use -n N to show more.
61.0% 100.0%   1.2MiB .text section size, the file size is 1.9MiB

Note: numbers above are a result of guesswork. They are not 100% correct and never will be.

@AlexTMjugador
Copy link
Collaborator

AlexTMjugador commented Nov 14, 2023

Panic we could change, though that has a smaller size impact.

If we want to seriously save space by changing panic to abort, we should use the companion panic_immediate_abort feature, as otherwise the final executable still contains panic formatting code and messages, and -Z build-std. Other than that, thanks for running some numbers, they look great!

@ace-dent ace-dent changed the title Optimize file size of win32 binary Optimize file size of release binaries Nov 14, 2023
@ace-dent
Copy link
Author

Thanks @andrews05 for testing!

I don't suggest any changes that reduce performance, just trying to eliminate the low-hanging fluff.
As your cargo-bloat data suggests, ~61% of the binary is text! Hopefully @AlexTMjugador suggestions might reduce this.

We definitely wouldn't want to change the default opt-level as that has a noticeable speed impact.

I presume the default level is 0? There may be an interesting avenue for tweaking a little more performance. Should I raise in a separate issue?

I would be reluctant to change... ...compile time, but we could consider it.

For releases, is the extra compile time on GH's infrastructure an issue? It also seems lto=fat may improve performance a little.

Lots of great stuff. Seems a binary < 900 KiB is achievable.
As a next step, could we try adding to build workflow?

@andrews05
Copy link
Collaborator

If we want to seriously save space by changing panic to abort, we should use the companion panic_immediate_abort feature, as otherwise the final executable still contains panic formatting code and messages, and -Z build-std. Other than that, thanks for running some numbers, they look great!

I did try this but it seemed to give identical output. Maybe I was missing something...

@ace-dent Default opt-level for release is 3, which is the highest performance level. PGO could indeed provide further improvements, it's not something I've looked into.

@andrews05
Copy link
Collaborator

andrews05 commented Nov 15, 2023

Ah, found build-std was building to a different path. I've updated my post above to show further savings from that.

@ace-dent Is there a particular target size you're hoping to achieve? We might be able to do a couple of these things by default, but some are a bit unorthodox, requiring use of unstable build tools and parameters. I would suggest if size is particularly important, users can build it themselves according to their needs.

@ace-dent
Copy link
Author

ace-dent commented Nov 16, 2023

@andrews05 - Thanks for all your work on this.

Default opt-level for release is 3 ... PGO could indeed provide further improvements ...

Great to know. I feel we should keep the current speed optimization o3, as the binary size is secondary. Should I create an issue as a reminder about future profiling?


Is there a particular target size you're hoping to achieve?

While 800 KiB still seems relatively big, it appears to be reasonable. Is there any issues with:

opt-level=3 lto=fat codegen-units=1 panic=abort location-detail=none build-std panic_immediate_abort

If you check the optimized binary with strings, are full file paths still embedded? May be worth trimming.


I would suggest if size is particularly important, users can build it themselves according to their needs.

Agreed (for more extreme options). For my use case, I might remove parallel processing, etc. But we should make a good effort for Release builds.

AlexTMjugador pushed a commit that referenced this issue Feb 20, 2024
This PR makes 3 changes that together reduce binary size by around 25%:
- Sets lto="fat" in cargo.toml
- Sets panic="abort" in cargo.toml
- Sets location-detail=none in RUSTFLAGS

Closes #571

An unrelated change: I've replaced the zopfli test file with a smaller
one that runs much faster, as well as removing the slow test for
issue-133 which was related to an older alpha optimisation that is no
longer relevant.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E-Easy Issues that someone new to the project could easily pick up OS-Windows
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants