-
Notifications
You must be signed in to change notification settings - Fork 14k
New format_args!() and fmt::Arguments implementation #148789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Experiment: New fmt::Arguments implementation (another one)
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (6e6ba94): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -1.5%, secondary -0.6%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -0.5%, secondary -4.3%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.7%, secondary -1.3%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 476.631s -> 471.922s (-0.99%) |
|
Ooh that's pretty good :D |
|
Pretty much everything looks like a great improvement. Not only number of instructions executed, but also memory usage and binary size. 🎉 Only two significant negative results: 1. "image-0.25.6 opt incr-patched:println" with almost +6% instructions:u.Looking at the detailed results, it looks like that's all LLVM. Probably because llvm got more optimization opportunities. That's not necessarily a bad thing. 2. The
|
9f41692 to
349d2b5
Compare
349d2b5 to
5b58c66
Compare
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Experiment: New fmt::Arguments implementation (another one)
|
just curious - why is there a trailing |
This comment has been minimized.
This comment has been minimized.
See the documentation in library/core/src/fmt/mod.rs. Without an end marker, we wouldn't know where to stop. We don't store the length of the template, to not waste extra space in fmt::Arguments. |
This comment was marked as outdated.
This comment was marked as outdated.
| fmt: Option<&'a [rt::Placeholder]>, | ||
| /// Used by the format_args!() macro to create a fmt::Arguments object. | ||
| #[doc(hidden)] | ||
| #[rustc_diagnostic_item = "FmtArgumentsNew"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a bootstrap thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why "New" then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not new. I just moved it from rt.rs to mod.rs, where the rest of fmt::Arguments lives.
|
@rust-timer build 155c5d4 |
This comment has been minimized.
This comment has been minimized.
| // | ||
| // The template byte sequence is the concatenation of parts of the following types: | ||
| // | ||
| // - Literal string piece (1-127 bytes): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use a varint encoding? We lose a bit (hehe) of efficiency on strings 64-127 bytes long as you'd now only be able to represent strings of len 1-63 with a single byte. But I'm imagining cases where somebody might have a very long string like say the GNU license that just slots in a name or something and it's probably sad to be required to splice up that string hundreds of times.
The varint encoding would be:
0x00xx_xxxx: single byte length, 1-63 range0x010x_xxxx-xxxx_xxxx: 2 bytes, 2^13-1 range- ...
0x0111_1110_...: 7 bytes, 2^(6*8)-1 range0x0111_1111_...: 9 bytes, special in that it adds two bytes from the previous encoding to represent the full u64 range, dunno if that's actually needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option could be to upgrade each long string piece to a placeholder with a string argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding a variable length encoding like you propose adds too much complexity for too little benefit. If 127 byte chunks are a problem, we could consider adding a single encoding for a 16 bit length. Then things get split up into chunks of 64KiB at most, which seems perfectly fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both options seem reasonable, maybe the first one is more robust? It even allows codegen to choose when to put things in the placeholder (e.g. it could decide 4 mini strings is ok but more than that and you go in the placeholder). Or honestly these aren't mutually exclusive, you could do both but I doubt that's necessary.
It'd be interesting to see if large formatted strings are even a thing. Like do we have data to know if this is even a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both options seem reasonable
I'll check which one is more efficient to implement, and will benchmark if it makes any significant difference.
It'd be interesting to see if large formatted strings are even a thing. Like do we have data to know if this is even a problem?
Good question. We could do a crater run with a rustc that throws an error when it sees a large format string. But I suspect that if it is common, it's more likely in binaries than in libraries, so a crater run will probably not be very representative.
|
The job Click to see the possible cause of the failure (guessed by this bot) |
|
Finished benchmarking commit (155c5d4): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -1.9%, secondary -0.5%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -1.3%, secondary -4.9%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.8%, secondary -1.3%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 477.625s -> 473.059s (-0.96%) |
|
☔ The latest upstream changes (presumably #148851) made this pull request unmergeable. Please resolve the merge conflicts. |

Part of #99012
This is a new implementation of
fmt::Arguments. In this implementation,fmt::Argumentsis only two pointers in size. (Instead of six, before.) This makes it the same size as a&strand makes it fit in a register pair.This
fmt::Argumentscan store a&'static strwithout any indirection or additional storage. This means that simple cases likeprint_fmt(format_args!("hello"))are now just as efficient for the caller asprint_str("hello"), as shown by this example:(
panic!("Hello, world!");shows a similar change.)This implementation stores all static information as just a single (byte) string, without any indirection:
This saves a ton of pointers and simplifies the expansion significantly, but does mean that individual pieces (e.g.
"Hello, "and"!\n") cannot be reused. (Those pieces are often smaller than a pointer to them, though, in which case reusing them is useless.)The details of the new representation are documented in library/core/src/fmt/mod.rs.