- 
                Notifications
    You must be signed in to change notification settings 
- Fork 13.9k
Minimize the cost of write_fmt without arguments #75358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Can I have a perf run? cc @Mark-Simulacrum | 
| @bors try @rust-timer queue | 
| Awaiting bors try build completion | 
| ⌛ Trying commit 23fdb55 with merge 01463991259f5ad8ff520b94d9c9c2e72cdf1a98... | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I didn't know this can be fixed without changing the compiler built-in format_args. @lzutao if I want to allow format_args to accept u8 or char rather than &str do I need to change the compiler code?
| ☀️ Try build successful - checks-actions, checks-azure | 
| Queued 01463991259f5ad8ff520b94d9c9c2e72cdf1a98 with parent 13290e8, future comparison URL. | 
| 
 | 
| Finished benchmarking try commit (01463991259f5ad8ff520b94d9c9c2e72cdf1a98): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying  Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never | 
| I don't know how to interpret this perf result.  It seems that most are regressed. | 
| I cannot reproduce your assembly with the try build's artifacts; I suspect that the difference you're seeing is likely an effect of local compilation options for std allowing more aggressive inlining or so than what we can do with dist-produced artifacts. use std::fmt::{self, Write as FmtWrite};
#[no_mangle]
pub fn bar(buf: &mut String) -> fmt::Result {
    write!(buf, " ")
}gets me: before this PR: ; rustc +13290e83a6e20f3b408d177a9d64d8cf98fe4615 -Copt-level=3 --crate-type=cdylib fmt.rs
; objdump -Mintel --disassemble=bar ./libfmt.so | rustfilt
    51e0:       48 83 ec 38             sub    rsp,0x38
    51e4:       48 89 3c 24             mov    QWORD PTR [rsp],rdi
    51e8:       48 8d 05 69 a8 03 00    lea    rax,[rip+0x3a869]        # 3fa58 <__do_global_dtors_aux_fini_array_entry+0x8>
    51ef:       48 89 44 24 08          mov    QWORD PTR [rsp+0x8],rax
    51f4:       48 c7 44 24 10 01 00    mov    QWORD PTR [rsp+0x10],0x1
    51fb:       00 00
    51fd:       48 c7 44 24 18 00 00    mov    QWORD PTR [rsp+0x18],0x0
    5204:       00 00
    5206:       48 8d 05 fb dd 02 00    lea    rax,[rip+0x2ddfb]        # 33008 <_fini+0x454>
    520d:       48 89 44 24 28          mov    QWORD PTR [rsp+0x28],rax
    5212:       48 c7 44 24 30 00 00    mov    QWORD PTR [rsp+0x30],0x0
    5219:       00 00
    521b:       48 8d 35 9e a8 03 00    lea    rsi,[rip+0x3a89e]        # 3fac0 <anon.4574bf75806fea7c4a1d662dd29266f6.0.llvm.133549432485651853>
    5222:       48 89 e7                mov    rdi,rsp
    5225:       48 8d 54 24 08          lea    rdx,[rsp+0x8]
    522a:       ff 15 b0 c8 03 00       call   QWORD PTR [rip+0x3c8b0]        # 41ae0 <_GLOBAL_OFFSET_TABLE_+0x70>
    5230:       48 83 c4 38             add    rsp,0x38
    5234:       c3                      ret
with this PR: ; rustc +01463991259f5ad8ff520b94d9c9c2e72cdf1a98 -Copt-level=3 --crate-type=cdylib fmt.rs
; objdump -Mintel --disassemble=bar ./libfmt.so | rustfilt
    51e0:       41 56                   push   r14
    51e2:       53                      push   rbx
    51e3:       50                      push   rax
    51e4:       49 89 fe                mov    r14,rdi
    51e7:       48 8b 77 10             mov    rsi,QWORD PTR [rdi+0x10]
    51eb:       ba 01 00 00 00          mov    edx,0x1
    51f0:       e8 3b ff ff ff          call   5130 <alloc::raw_vec::RawVec<T,A>::reserve>
    51f5:       49 8b 5e 10             mov    rbx,QWORD PTR [r14+0x10]
    51f9:       49 8b 3e                mov    rdi,QWORD PTR [r14]
    51fc:       48 01 df                add    rdi,rbx
    51ff:       48 8d 15 fa dd 02 00    lea    rdx,[rip+0x2ddfa]        # 33000 <_fini+0x42c>
    5206:       be 01 00 00 00          mov    esi,0x1
    520b:       b9 01 00 00 00          mov    ecx,0x1
    5210:       e8 1b 00 00 00          call   5230 <core::slice::<impl [T]>::copy_from_slice>
    5215:       48 83 c3 01             add    rbx,0x1
    5219:       49 89 5e 10             mov    QWORD PTR [r14+0x10],rbx
    521d:       31 c0                   xor    eax,eax
    521f:       48 83 c4 08             add    rsp,0x8
    5223:       5b                      pop    rbx
    5224:       41 5e                   pop    r14
    5226:       c3                      ret | 
| @Mark-Simulacrum that seems to be even more aggressively inlined, i.e,  | 
| Hm, I tried "try" build and master artifact by using rustup-toolchain-install-master: rustc -C debuginfo=1 -o ./foo-stage1.s --emit asm -Cllvm-args=--x86-asm-syntax=intel --crate-type rlib --color=always -C opt-level=3 -Z symbol-mangling-version=v0 ./foo.rsThe source files are the same as in godbolt link. No no_mangle. | 
| write(&mut self, args) | ||
| match args.as_str() { | ||
| Some(s) => self.write_str(s), | ||
| None => write(&mut self, args), | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This special casing should probably be done inside write to cover more cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it may already be done there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that moving this logic to write doesn't help inline calls.
| Hi, I created a zulip topic to discuss about this change: https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Discuss.20changes.20in.20.20.2375358 | 
| Probably try again when inline const implemented. | 
| This change could be very beneficial for embedded/no_std projects. Not necessarily for performance, but for code size.  The discussion above was only about performance, where it seems to make little difference. However, code size, especially for no_std projects, is also very important. #78122 (comment) contains a simple benchmark program that could be used to check the difference this PR makes. | 
| Another data point: There are quite a few macros in the wild that use a special-case to avoid calling  This change would make the special case there identical to the generic case, simplifying things. | 
Thanks to
Arguments::as_str, we should knowwhen we don't have any arguments at all.
Before: https://rust.godbolt.org/z/zKGEMP
After:
Closes #75301
Zulip discussion: https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Discuss.20changes.20in.20.20.2375358