-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize format_args placeholders without options: Display::simple_fmt #104525
Conversation
This basic test program shows great results with this change: #![feature(rustc_private)]
#![feature(lang_items)]
#![feature(start)]
#![no_std]
extern crate libc;
use core::fmt::{self, Write};
struct Stdout;
impl Write for Stdout {
fn write_str(&mut self, s: &str) -> fmt::Result {
unsafe { libc::write(1, s.as_ptr().cast(), s.len()) };
Ok(())
}
}
#[start]
fn main(_: isize, _: *const *const u8) -> isize {
let s = "world";
writeln!(Stdout, "Hello, {}!", s).is_err() as isize
}
#[lang = "eh_personality"]
extern "C" fn eh_personality() {}
#[panic_handler]
fn panic(_info: &core::panic::PanicInfo) -> ! {
unsafe { libc::abort() };
} Before: A .text section of 6175 bytes After: A .text section of 2666 bytes (A 57% reduction! ✨) Most of the difference is that the binary no longer includes |
For binaries that use string formatting with options anywhere, there won't be any benefit in binary size. There might still be a tiny tiny performance gain, although that's also unlikely. But for small (e.g. embedded) programs, this can make a great difference, as the test program above shows. |
Wouldn't it be simpler to have a default method on |
Yes, definitely. (Depending on your definition of "simpler".) Doing it as a default method on the traits doesn't require specialization and is a better solution in various ways, but does require a bit more work to implement, as it requires more changes to the format_args!() macro and ArgumentV1 type, so I didn't do that in my draft implementation. Now that we've verified this PR can provide a big improvement in some cases, I'll update it to do it the better way. :) |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit 3d68c2139e8fb9a37960638feae986f02c4721a7 with merge c85ca3b2e89f3b73daa36b9976c48d69adbdd0cf... |
It's now implemented for all format traits (Display, Debug, Binary, Octal, and so on), but that does increase the size of the vtable for those traits. Let's wait for the test results to see if that has any significant impact. |
This comment has been minimized.
This comment has been minimized.
@aDotInTheVoid this PR is failing on
What is that test for? |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
TLDR: The test is fragile, and assumes
The test is checking that when rustdoc-json adds a foreign trait to the index, it also adds the methods of the trait. The fixme is because long term we don't want to add foreign traits to the index, and have the index only have local items. The test is failing because it trys to get the |
Finished benchmarking commit (c85ca3b2e89f3b73daa36b9976c48d69adbdd0cf): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
|
Thanks! |
There's quite a few ~1% regressions in compilation time. They're not that significant, but it would be nice to fix them. My theory is that it takes LLVM a somewhat significant amount of time to basically optimize the default It would be nice if we had a feature for this in the language, so we can literally write Maybe someone from the compiler team has some good ideas here. :) |
I think this could be done as a MIR optimization when one function forwards directly to another. In that case we could just emit an LLVM symbol alias instead of emitting an inline function. |
…, r=notriddle Rustdoc Json Tests: Don't assume that core::fmt::Debug will always have one item. See rust-lang#104525 (comment) and rust-lang#104525 (comment) for motivation. This still assumes that `fmt` is the first method, but thats alot less brittle than assuming it will be the only method. Sadly, we can't use a aux crate to insulate the tests from core changes, because core is special, so all we can do is try not to depend on things that may change.
The binary size results show lots of 1-2% regressions, which is unfortunate for this PR which is all about reducing binary sizes :( |
Much better perf results now :) Still some sub-1% regressions in binary size. Not a showstopper, but a brief investigation would be good to see if they can be easily avoided. |
It's a bit unnecessary that the vtable used by Putting it in a separate trait (so, |
This comment was marked as off-topic.
This comment was marked as off-topic.
r? compiler |
How much complexity we are talking about? The optimization slightly pessimizes the common case to make a difference in very specific cases like #104525 (comment). |
Are there any situations where I wonder if we could instead add With The lowering would then have to match on |
That will make binary bloat even worse as the if condition can't be optimized away due to the formatting machinery using dynamic dispatch everywhere. |
Not just strings, but also e.g. bools or enums with only fieldless variants - because the returned lifetime of But even if this was only applicable to strings, I think that it is an important case to optimize. I have seen many situations where strings are formatted, and now that's sadly suboptimal. The flattening/inlining of
Yeah, optimizing the branch would be problematic with dynamic dispatch. But I guess that this could be solved with calling a function through virtual dispatch that would do the condition inside, so that the condition is only codegened once. It would mean double indirection, but that would also happen for types having the default implementation of Something like fn simple_fmt(obj: &Display, fmt: &mut Formatter) {
match obj.as_str() {
Some(s) => fmt.write_str(s),
None => obj.fmt(fmt)
}
} |
That still doesn't allow eliminating the big |
Probably not, and that is also not the point of the proposed Regarding binary size, I don't have much experience with Rust embedded, so this is just an uninformed opinion, but I think that the code size wins (although they are great!) here might not translate that well into real world programs. If having fmt in the binary or not having fmt in the binary is a fundamental difference for an embedded program, then it seems quite brittle to base this on an optimization that basically breaks once you use |
Maybe add a size_hint method for that? |
Nice, that would also work, and could be even more general, e.g. integers could return their log10 number of digits. |
@m-ou-se |
Closing this as inactive. Feel free to reöpen this pr or create a new pr if you get the time to work on this. Thanks |
This is part of #99012
format_args!("{}", "hello")
pulls in the entireDisplay for str
implementation, which includes the code necessary to support formatting options like padding (e.g."{:50^}"
, etc.), which is quite unnecessary for the basic case of formatting with no options ("{}"
).This adds a new method to the format trait:
Display::simple_fmt
. It is implemented by forwarding to the regularDisplay::fmt
method, but is overridden in theDisplay
impl ofString
andstr
to just callwrite_str
directly, avoiding pulling in any code related to padding.