-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
i8 and u8::to_string() specialisation (far less asm). #82576
Conversation
(rust-highfive has picked a reviewer for you, use r? to override) |
This comment has been minimized.
This comment has been minimized.
-1 isn’t a u8 is it? This failing test is a bit of a curveball. |
I guess it's some shenanigans with type inference. Before it was being inferred as |
Type inference breaking changes are in principle allowed by the stability rules, but you could try routing the default implementation ( |
Ok, well if the easiest thing is to make the rest faster I'm gaim with that. Double or quits. That would completely close the associated ticket. |
If you prefer splitting the PR then you could first specialize for |
r? @KodrAus since it sounds like there's inference breakage here so likely needs T-libs decision The new assembly looks pretty unfortunate to me FWIW, even though it does seem better than what was there before. |
I don't want to break anything. Will update this PR with a more comprehensive solution. |
"The new assembly looks pretty unfortunate to me" - really interested in this. I'm just getting into understanding the asm side of things. I've found a construction that allows the String with_capacity to be set to the exact length without the vec growth asm being included but it does add an extra 20+ instructions. I guess memory versus instructions is a tradeoff. At the moment |
Oh, I was mainly thinking that there's several multiplications in the generated code - it feels like a u8 should be convertible to decimal without going through that overhead, but I'm not really familiar with the state of the art here. |
I tried lots of things and couldn't get it below 65ns (The original is around 85ns for me). Then I removed everything and just returned |
This comment has been minimized.
This comment has been minimized.
Sometimes it looks like the compiler can figure out when there's no possibility of growth in a vec and avoid generating the instructions but quite often it can't tell: both these approches trigger it including the grow: https://godbolt.org/z/zxG7hK This is probably the best example of catching it in the act: Digging into this MIR seems the same as well as LLIR so alas this seems to be in llvm that it is deciding to do something very different. (Edit: Am now wondering if the above differences are just godbolt having an off day or if it's really that different.) |
After much tweaking found a way to get similar asm size as the u8 to_string implementation.
Well godbolt ( https://rust.godbolt.org/z/Keb5ao ) at least seems happy with the i8 impl. It's about as good as I can get it and it's pretty fast - the compulsary alloc consumes the vast amount of the time, but with jemalloc it's over twice as fast as the old way and with the system allocator it seems to shave off 25%. If anyone has any other sugestions of things I've not tried yet please let me know, but if not I think its ready for another review. |
Using a vec for at most 4 bytes (since it's all ascii) seems quite complicated. Have you tried operating on a local array instead and only allocating the output string once the length is known? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels a bit like we're solving the wrong problem by hacking around compiler deficiencies. I would expect the compiler to generate decent assembly for something like this:
pub fn to_string(elf: &i8) -> String {
let mut s = String::with_capacity(4);
if elf.is_negative() {
s.push('-');
}
let mut n = elf.unsigned_abs();
if n >= 10 {
if n >= 100 {
n -= 100;
s.push('1');
}
s.push((b'0' + n/10) as char);
n %= 10;
}
s.push((b'0' + n) as char);
s.shrink_to_fit();
s
}
However, as you already noticed in #82576 (comment), it can no longer tell that the Vec won't grow once you add the fourth push
. Could you open an issue for that if there isn't one already?
The question is now: Do we want to push a simpler, safe version, in the hopes that the compiler will better optimize it in the future? Or do we want to push the more complex, unsafe version and hopefully remember to replace it once the compiler can deal with the simpler one?
In any case I've left some minor comments on the current code.
Looks like the String-based version optimizes fine when MIR optimizations are disabled, which suggests that it's the same problem as #82801. |
Just to make sure there's no misunderstanding: It's not my approval you need to land this. I'm just a random dude without any power in that regard. That said, it would be cool if we could do this and have the compiler optimize it well: impl ToString for i8 {
#[inline]
fn to_string(&self) -> String {
let mut buf = String::with_capacity(4);
if self.is_negative() {
buf.push('-');
}
buf.push_str(&self.unsigned_abs().to_string());
buf
}
} Alas, it can't do that (yet?). |
@LingMan thank you for all your help on this PR, it's been really appreciated! Frankly I think to_string() is technically a little broken. Ideally something like |
@Mark-Simulacrum this one's ready for review at your leasure. |
r? @Amanieu |
Knowing what I have recently learned I can probably shave a bit off these timings by pulling the adds a little earlier in the function. Give me an evening to revisit this. |
No maybe not. I think that's as good as it gets. |
That's great! Do you think you could implement this for other integer sizes too? |
Idk, the way the larger integer types are currently handled makes sense - the two at a time digits makes sense - it's just at the short end that the general implemention seems overkill. Mostly I'm depressed that anything done here is dwarfed by the alloc that we can't ever avoid due to the signature. Rather I'm focusing on the other end - parsing ints which isn't bounded by allocation times. |
@bors r+ |
📌 Commit 05330aa has been approved by |
☀️ Test successful - checks-actions |
@@ -2224,6 +2224,47 @@ impl ToString for char { | |||
} | |||
} | |||
|
|||
#[stable(feature = "u8_to_string_specialization", since = "1.999.0")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is with the since tag? Why 1.999?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already fixed on master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, wasn't sure what to put for that. Yes 1.54 makes sense, but at the time I didn't want to presume so put something a little further out.
Take 2. Around 1/6th of the assembly to without specialisation.
https://godbolt.org/z/bzz8Mq
(partially fixes #73533 )