Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary size optimization experiment #569

Closed
wants to merge 2 commits into from

Conversation

EFanZh
Copy link
Contributor

@EFanZh EFanZh commented Jul 9, 2023

Experiment for some ideas I have on binary size optimization.

src/macros.rs Outdated
let lvl = $lvl;
if lvl <= $crate::STATIC_MAX_LEVEL && lvl <= $crate::max_level() {
match (::std::format_args!($($arg)+), $target, $kvs) {
(args, target, kvs) => if let ::std::option::Option::Some(literal) = ::std::fmt::Arguments::as_str(&args) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessary, since it is turned into an fmt::Argument<'_> again when constructing Record<'_>.

I think this actually prevents the Log implementation to take advantage of Argument::as_str.

Copy link
Contributor Author

@EFanZh EFanZh Jul 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that passing an Arguments value is more costly than passing an str value: https://godbolt.org/z/bzbnfq1TM. I expect an application may contain a lot of logs that only contains string literals, so if we can reduce the cost of each log call, the total binary size could been reduced. Essentially, this implementation tries to bring back an old optimization that was reverted by #446.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can pass a reference to Arguments, then dereference it when building Records.
It implements Copy, so you can avoid copying it in each function while preserving the ability to take advantages of Arguments::as_str

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from copying, building an Argument object also has costs: https://godbolt.org/z/a1eqr4bq5.

Copy link
Contributor Author

@EFanZh EFanZh Jul 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that passing Arguments and passing &Arguments have the same cost: https://godbolt.org/z/Wz35xYrsr.

@NobodyXu
Copy link

NobodyXu commented Jul 9, 2023

Is this PR trying to take advantages of rust-lang/rust#109999 ?

@EFanZh
Copy link
Contributor Author

EFanZh commented Jul 9, 2023

Is this PR trying to take advantages of rust-lang/rust#109999 ?

Not exactly. This PR intends to minimize code generated by macro expansion. This is done by moving as much code logic as possible into functions that can be shared by each macro call.

@NobodyXu
Copy link

NobodyXu commented Jul 9, 2023

Is this PR trying to take advantages of rust-lang/rust#109999 ?

Not exactly. This PR intends to minimize code generated by macro expansion. This is done by moving as much code logic as possible into functions that can be shared by each macro call.

Thanks for explanation.

Copy link
Collaborator

@Thomasdezeeuw Thomasdezeeuw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we use to have a special case for literals: #366, but it was removed in 2d1a477.

@EFanZh EFanZh force-pushed the optimize-binary-size branch 2 times, most recently from 1ba35e5 to 93eed72 Compare July 10, 2023 17:19
@EFanZh
Copy link
Contributor Author

EFanZh commented Jul 10, 2023

Note that we use to have a special case for literals: #366, but it was removed in 2d1a477.

Yes, I’m aware of that, I am trying to bring that optimization back.

@EFanZh
Copy link
Contributor Author

EFanZh commented Jul 11, 2023

It seems that using "opt-level=z", the compiler won’t inline Arguments::as_str if there are multiple calls to Arguments::as_str: https://godbolt.org/z/rzsG3fYEK. So both None and Some branches are kept in the final binary, which will cause binary size to grow.

Maybe marking Arguments::as_str as #[inline(always)] could help, but for now, I think I’ll have to give up specializing literal arguments.

@NobodyXu
Copy link

Maybe marking Arguments::as_str as #[inline(always)] could help, but for now, I think I’ll have to give up specializing literal arguments.

You could just pass &fmt::Argument<'_> instead of pass it by-value, it should remove the generated code for copying it.

@EFanZh
Copy link
Contributor Author

EFanZh commented Jul 13, 2023

@NobodyXu: I have tested the effect of passing &Arguments, and the result is not much different than passing Arguments directly. Maybe it’s because Rust already passes large object as references: https://godbolt.org/z/Wz35xYrsr. You can see in the example, passing Arguments and passing &Arguments generate the same assembly code.

@NobodyXu
Copy link

@NobodyXu: I have tested the effect of passing &Arguments, and the result is not much different than passing Arguments directly. Maybe it’s because Rust already passes large object as references: https://godbolt.org/z/Wz35xYrsr. You can see in the example, passing Arguments and passing &Arguments generate the same assembly code.

Your linked code does not actually use the Arguments, so the compiler might omits the copy requires to happen.

In your PR, you passes the Arguments to log2, log1, log0 by value, that would require copying Arguments between stacks unless inlined.

@EFanZh
Copy link
Contributor Author

EFanZh commented Jul 13, 2023

@NobodyXu: I have tested the effect of passing &Arguments, and the result is not much different than passing Arguments directly. Maybe it’s because Rust already passes large object as references: https://godbolt.org/z/Wz35xYrsr. You can see in the example, passing Arguments and passing &Arguments generate the same assembly code.

Your linked code does not actually use the Arguments, so the compiler might omits the copy requires to happen.

In your PR, you passes the Arguments to log2, log1, log0 by value, that would require copying Arguments between stacks unless inlined.

log0, log1, log can only have a fixed number of instances, which means they can only contribute to at most a constant size to the final binary, while the majority binary size comes from codes generated by macro calls, since there could be a lot of log calls for a large project. So optimizing function implementations should not matter too much. But still, I will compare different ways to pass arguments, and choose the best one.

@EFanZh EFanZh force-pushed the optimize-binary-size branch 2 times, most recently from ac32187 to 14c70e5 Compare July 16, 2023 12:23
@EFanZh
Copy link
Contributor Author

EFanZh commented Jul 17, 2023

OK, I think this is ready for the next step. This PR is just a proof of concept of some ideas I have, and should not be merged as is. I have tested this PR on my personal project and gained about 2.4% binary size reduction, which is significant to me. Now I want to implement these ideas in the log crate. Since this PR contains several optimization ideas, and I am not certain that all of them have positive effect on the binary size, so the best way maybe to discuss each idea, and decide whether it is reasonable. Finally, we may implement and merge each acceptable idea individually.

  1. Pass &str instead of Arguments if the message is a string literal.
    • I used a const function to check whether the format string can be treated as a string literal. And the condition is whether it contains '{' or '}' character, see the is_literal function.
  2. If log level is known at compile time, pass it statically.
  3. If no key-value pair is provided, pass the information statically.
  4. If target is the same as module path, pass the information statically.
  5. Pass module path and file as &'static (&'static str, &'static str), so that logs from a single file can share the same argument.
  6. If the kvs and format arguments do not have side effects, move the level filter check inside the function. (Needs further investigation, not implemented in this PR)
  7. If the final implementation is done using generic functions like this PR, make sure all generic function instances are instantiated in this crate (the unused function in this PR), so that downstream crates can utilize them by enabling share-generics.

Also, is there a known open source project that uses log extensively, so I can test the effect of my optimization on it? I only have a private project to test. I think my optimization will need a reproducible result.

@NobodyXu
Copy link

Also, is there a known open source project that uses log extensively, so I can test the effect of my optimization on it? I only have a private project to test. I think my optimization will need a reproducible result.

In cargo-binstall, log is a dependency that is heavily used:

log v0.4.19
├── async_zip v0.0.15
│   └── binstalk-downloader v0.6.0 (/Users/nobodyxu/Dev/cargo-binstall/crates/binstalk-downloader)
│       └── binstalk v0.13.0 (/Users/nobodyxu/Dev/cargo-binstall/crates/binstalk)
│           └── cargo-binstall v1.0.0 (/Users/nobodyxu/Dev/cargo-binstall/crates/bin)
├── cargo-binstall v1.0.0 (/Users/nobodyxu/Dev/cargo-binstall/crates/bin)
├── gix v0.48.0
│   └── binstalk v0.13.0 (/Users/nobodyxu/Dev/cargo-binstall/crates/binstalk) (*)
├── gix-attributes v0.14.1
│   ├── gix v0.48.0 (*)
│   └── gix-worktree v0.21.1
│       └── gix v0.48.0 (*)
├── gix-config v0.25.1
│   └── gix v0.48.0 (*)
├── guess_host_triple v0.1.3
│   └── detect-targets v0.1.8 (/Users/nobodyxu/Dev/cargo-binstall/crates/detect-targets)
│       └── binstalk v0.13.0 (/Users/nobodyxu/Dev/cargo-binstall/crates/binstalk) (*)
│       [dev-dependencies]
│       └── binstalk-manifests v0.6.0 (/Users/nobodyxu/Dev/cargo-binstall/crates/binstalk-manifests)
│           └── cargo-binstall v1.0.0 (/Users/nobodyxu/Dev/cargo-binstall/crates/bin)
├── reqwest v0.11.18
│   ├── binstalk-downloader v0.6.0 (/Users/nobodyxu/Dev/cargo-binstall/crates/binstalk-downloader) (*)
│   ├── gix v0.48.0 (*)
│   └── gix-transport v0.33.1
│       ├── gix v0.48.0 (*)
│       └── gix-protocol v0.35.0
│           └── gix v0.48.0 (*)
├── rustls v0.20.8
│   ├── quinn v0.8.5
│   │   └── trust-dns-proto v0.22.0
│   │       └── trust-dns-resolver v0.22.0
│   │           ├── binstalk-downloader v0.6.0 (/Users/nobodyxu/Dev/cargo-binstall/crates/binstalk-downloader) (*)
│   │           └── reqwest v0.11.18 (*)
│   ├── quinn-proto v0.8.4
│   │   ├── quinn v0.8.5 (*)
│   │   └── quinn-udp v0.1.4
│   │       └── quinn v0.8.5 (*)
│   ├── tokio-rustls v0.23.4
│   │   ├── trust-dns-proto v0.22.0 (*)
│   │   └── trust-dns-resolver v0.22.0 (*)
│   ├── trust-dns-proto v0.22.0 (*)
│   └── trust-dns-resolver v0.22.0 (*)
├── rustls v0.21.5
│   ├── hyper-rustls v0.24.1
│   │   └── reqwest v0.11.18 (*)
│   ├── reqwest v0.11.18 (*)
│   └── tokio-rustls v0.24.1
│       ├── hyper-rustls v0.24.1 (*)
│       └── reqwest v0.11.18 (*)
├── tracing v0.1.37
│   ├── binstalk v0.13.0 (/Users/nobodyxu/Dev/cargo-binstall/crates/binstalk) (*)
│   ├── binstalk-downloader v0.6.0 (/Users/nobodyxu/Dev/cargo-binstall/crates/binstalk-downloader) (*)
│   ├── cargo-binstall v1.0.0 (/Users/nobodyxu/Dev/cargo-binstall/crates/bin)
│   ├── h2 v0.3.20
│   │   ├── hyper v0.14.27
│   │   │   ├── hyper-rustls v0.24.1 (*)
│   │   │   └── reqwest v0.11.18 (*)
│   │   ├── reqwest v0.11.18 (*)
│   │   └── trust-dns-proto v0.22.0 (*)
│   ├── hyper v0.14.27 (*)
│   ├── quinn v0.8.5 (*)
│   ├── quinn-proto v0.8.4 (*)
│   ├── quinn-udp v0.1.4 (*)
│   ├── tokio-util v0.7.8
│   │   ├── async_zip v0.0.15 (*)
│   │   ├── binstalk-downloader v0.6.0 (/Users/nobodyxu/Dev/cargo-binstall/crates/binstalk-downloader) (*)
│   │   ├── h2 v0.3.20 (*)
│   │   ├── reqwest v0.11.18 (*)
│   │   └── tower v0.4.13
│   │       └── binstalk-downloader v0.6.0 (/Users/nobodyxu/Dev/cargo-binstall/crates/binstalk-downloader) (*)
│   ├── tower v0.4.13 (*)
│   ├── trust-dns-proto v0.22.0 (*)
│   └── trust-dns-resolver v0.22.0 (*)
├── tracing-log v0.1.3
│   └── cargo-binstall v1.0.0 (/Users/nobodyxu/Dev/cargo-binstall/crates/bin)
└── wasm-bindgen-backend v0.2.87
    └── wasm-bindgen-macro-support v0.2.87
        └── wasm-bindgen-macro v0.2.87 (proc-macro)
            └── wasm-bindgen v0.2.87
                ├── js-sys v0.3.64
                │   ├── reqwest v0.11.18 (*)
                │   ├── wasm-bindgen-futures v0.4.37
                │   │   ├── reqwest v0.11.18 (*)
                │   │   └── wasm-streams v0.2.3
                │   │       └── reqwest v0.11.18 (*)
                │   ├── wasm-streams v0.2.3 (*)
                │   └── web-sys v0.3.64
                │       ├── reqwest v0.11.18 (*)
                │       ├── ring v0.16.20
                │       │   ├── quinn-proto v0.8.4 (*)
                │       │   ├── rustls v0.20.8 (*)
                │       │   ├── rustls v0.21.5 (*)
                │       │   ├── rustls-webpki v0.101.1
                │       │   │   └── rustls v0.21.5 (*)
                │       │   ├── sct v0.7.0
                │       │   │   ├── rustls v0.20.8 (*)
                │       │   │   └── rustls v0.21.5 (*)
                │       │   ├── trust-dns-proto v0.22.0 (*)
                │       │   └── webpki v0.22.0
                │       │       ├── quinn v0.8.5 (*)
                │       │       ├── quinn-proto v0.8.4 (*)
                │       │       ├── rustls v0.20.8 (*)
                │       │       ├── tokio-rustls v0.23.4 (*)
                │       │       ├── trust-dns-proto v0.22.0 (*)
                │       │       └── webpki-roots v0.22.6
                │       │           ├── reqwest v0.11.18 (*)
                │       │           ├── trust-dns-proto v0.22.0 (*)
                │       │           └── trust-dns-resolver v0.22.0 (*)
                │       ├── wasm-bindgen-futures v0.4.37 (*)
                │       └── wasm-streams v0.2.3 (*)
                ├── reqwest v0.11.18 (*)
                ├── wasm-bindgen-futures v0.4.37 (*)
                ├── wasm-streams v0.2.3 (*)
                └── web-sys v0.3.64 (*)

@EFanZh
Copy link
Contributor Author

EFanZh commented Jul 18, 2023

In cargo-binstall, log is a dependency that is heavily used

It seems that cargo-binstall mostly uses logging macros from tracing, not log. Even though it depends on the log crate, not using its macros means it may not benefit from these optimizations.

@EFanZh EFanZh marked this pull request as ready for review July 18, 2023 02:32
@NobodyXu
Copy link

In cargo-binstall, log is a dependency that is heavily used

It seems that cargo-binstall mostly uses logging macros from tracing, not log. Even though it depends on the log crate, not using its macros means it may not benefit from these optimizations.

It uses many dependencies which depends on log, like gix, async-zip, etc.

If you check the "details" which contained the dependency tree, you would find it actually heavily depends on log.

module_path_and_file: &'static (&'static str, &'static str),
line: u32,
level: L,
args: A,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could try passing &Arguments<'_> here and in all log_* instead of passing generics here.

I still think losing Arguments::as_str as an optimization is bad and passing it by reference could make up for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a const function to check whether the format argument is a string literal as a not perfect replacement for Arguments::as_str, that’s why I used a generic argument here. Sometimes I need to pass &str as args.

As for &Arguments, I think I should do some test, then decide which one to use.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a const function to check whether the format argument is a string literal as a not perfect replacement for Arguments::as_str

I was actually referring to using Argument::as_str in Log::log, the implementation might be able to avoid heap alloc using Argument::as_str, passing &str here breaks that optimization.

Copy link
Contributor Author

@EFanZh EFanZh Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand. Log::log is not implemented by this crate, so how do I utilize Argument::as_str?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand. Log::log is not implemented by this crate, so how do I utilize Argument::as_str?

Implementator of Log::log could use Argument::as_str to avoid heap allocation.

Now that you decided to pass arg as &str if possible, they can no longer utilize this optimization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. Maybe this can be optimized by format_args! as a special case?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. Maybe this can be optimized by format_args! as a special case?

Ideally yes, but that will require the args to be &'static str, which involves lifetime and is a bit hard to do.

I guess we can ask @m-ou-se for this?

Copy link
Contributor Author

@EFanZh EFanZh Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simpler approach could be impl<'a> From<&'a &'static str> for Arguments<'a>, utilizing Arguments::new_const.


impl LogKVs for &[(&str, &LogKvValue<'_>)] {
#[inline]
fn with(self, f: impl FnOnce(&[(&str, &LogKvValue)])) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why every Log* trait has to accept a closure here?
Can't they just return the value directly?

Returning the closure would only add more overhead and generate more code.

Copy link
Contributor Author

@EFanZh EFanZh Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For LogArgs, I can’t return a Arguments value from a &str value:

fn f(s: &str) -> Arguments<'_> {
    format_args!("{s}") // Does not work because a local variable reference is captured.
}

So I have to use a callback function.

For other ones, since they need to be converted into references, the trait method may need to take a &self argument, and I don’t feel like taking references from &str or a zero sized type, since the will get a &&str type and &ZST type. But I am not very insistent on this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I have to use a callback function.

Yeah for LogArgs you have to use a callback due to limitation of Arguments<'_>, although I wonder

For other ones, since they need to be converted into references, the trait method may need to take a &self argument, and I don’t feed like taking references from &str or a zero sized type, since the will get a &&str type and &ZST type. But I am not very insistent on this.

I don't think there's much issue with this since it will most likely be inlined.

Copy link
Contributor Author

@EFanZh EFanZh Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, now I have removed callback functions from LogTarget and LogKVs.

@EFanZh
Copy link
Contributor Author

EFanZh commented Jul 23, 2023

I have tests this PR on cargo-binstall, and here is the binary size comparison:

opt-level lto codegen-units original size optimized size diff
3 off 1 14211752 14203560 -8192
3 off 16 16739048 16726760 -12288
3 thin 1 15170120 15170120 0
3 thin 16 17566376 17574568 8192
3 fat 1 13859336 13859336 0
3 fat 16 15231496 15227400 -4096
"z" off 1 11451048 11430568 -20480
"z" off 16 13531848 13503176 -28672
"z" thin 1 12307048 12286568 -20480
"z" thin 16 13699720 13736584 36864
"z" fat 1 9951752 9931272 -20480
"z" fat 16 10476040 10447368 -28672

You can reproduce the result using this program.

@EFanZh EFanZh marked this pull request as draft August 12, 2023 06:39
@Thomasdezeeuw Thomasdezeeuw mentioned this pull request Aug 27, 2023
@KodrAus
Copy link
Contributor

KodrAus commented Jun 25, 2024

Thanks for investigating this @EFanZh! These were a useful point-in-time reference that helped make some improvements, but I think we should close now that things have moved along. Please feel free to open any fresh PRs that optimize binary size based on the current state of the code though.

@KodrAus KodrAus closed this Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants