i8 and u8::to_string() specialisation (far less asm). #82576

gilescope · 2021-02-27T01:22:42Z

Take 2. Around 1/6th of the assembly to without specialisation.

https://godbolt.org/z/bzz8Mq

(partially fixes #73533 )

rust-highfive · 2021-02-27T01:22:44Z

r? @Mark-Simulacrum

(rust-highfive has picked a reviewer for you, use r? to override)

gilescope · 2021-02-27T01:34:43Z

-1 isn’t a u8 is it? This failing test is a bit of a curveball.

SkiFire13 · 2021-02-27T10:41:15Z

I guess it's some shenanigans with type inference. Before it was being inferred as i32, but now that it's being inferred as u8. Consider this example (playground), without the specialized impl of Foo for u8 it compiles fine, with it it throws an error.

the8472 · 2021-02-27T14:20:17Z

Type inference breaking changes are in principle allowed by the stability rules, but you could try routing the default implementation (impl<T: fmt::Display + ?Sized> ToString for T {)) through a private specialization trait instead. Or try implementing it for all integer types.

gilescope · 2021-02-27T16:57:23Z

Ok, well if the easiest thing is to make the rest faster I'm gaim with that. Double or quits. That would completely close the associated ticket.

SkiFire13 · 2021-02-27T17:09:38Z

If you prefer splitting the PR then you could first specialize for i32 since that's picked over the other integer types anyway.

Mark-Simulacrum · 2021-03-01T02:52:08Z

r? @KodrAus since it sounds like there's inference breakage here so likely needs T-libs decision

The new assembly looks pretty unfortunate to me FWIW, even though it does seem better than what was there before.

gilescope · 2021-03-02T03:16:45Z

I don't want to break anything. Will update this PR with a more comprehensive solution.

gilescope · 2021-03-02T03:28:16Z

"The new assembly looks pretty unfortunate to me" - really interested in this. I'm just getting into understanding the asm side of things. I've found a construction that allows the String with_capacity to be set to the exact length without the vec growth asm being included but it does add an extra 20+ instructions. I guess memory versus instructions is a tradeoff. At the moment b'a'.to_string().capacity() returns 8 so whether it's 3 or the exact size it's a lot smaller in memory?

Mark-Simulacrum · 2021-03-02T13:42:56Z

Oh, I was mainly thinking that there's several multiplications in the generated code - it feels like a u8 should be convertible to decimal without going through that overhead, but I'm not really familiar with the state of the art here.

gilescope · 2021-03-03T08:14:41Z

I tried lots of things and couldn't get it below 65ns (The original is around 85ns for me). Then I removed everything and just returned String::with_capacity(3) - took 65ns. Alas there's no way that I can avoid that allocation so I guess we should optimise for min code size and trying to match the capacity closely.
(edit) I switched to jemalloc and it's now about 22ns. That makes it a bit easier to tell if changes are making a perf difference.

library/alloc/src/string.rs

gilescope · 2021-03-04T09:01:10Z

Sometimes it looks like the compiler can figure out when there's no possibility of growth in a vec and avoid generating the instructions but quite often it can't tell: both these approches trigger it including the grow: https://godbolt.org/z/zxG7hK
Is that just something that's fine becaue it gets amortised over a big program or will it dump out those instructions every time it's inlined?

This is probably the best example of catching it in the act:

https://godbolt.org/z/ePvYrs

Digging into this MIR seems the same as well as LLIR so alas this seems to be in llvm that it is deciding to do something very different. (Edit: Am now wondering if the above differences are just godbolt having an off day or if it's really that different.)

library/alloc/src/string.rs

After much tweaking found a way to get similar asm size as the u8 to_string implementation.

gilescope · 2021-03-07T22:26:01Z

Well godbolt ( https://rust.godbolt.org/z/Keb5ao ) at least seems happy with the i8 impl. It's about as good as I can get it and it's pretty fast - the compulsary alloc consumes the vast amount of the time, but with jemalloc it's over twice as fast as the old way and with the system allocator it seems to shave off 25%.

If anyone has any other sugestions of things I've not tried yet please let me know, but if not I think its ready for another review.

the8472 · 2021-03-07T22:33:11Z

Using a vec for at most 4 bytes (since it's all ascii) seems quite complicated. Have you tried operating on a local array instead and only allocating the output string once the length is known?

LingMan

It feels a bit like we're solving the wrong problem by hacking around compiler deficiencies. I would expect the compiler to generate decent assembly for something like this:

pub fn to_string(elf: &i8) -> String {
    let mut s = String::with_capacity(4);
    if elf.is_negative() {
        s.push('-');
    }
    let mut n = elf.unsigned_abs();
    if n >= 10 {
        if n >= 100 {
            n -= 100;
            s.push('1');
        }
        s.push((b'0' + n/10) as char);
        n %= 10;
    }
    s.push((b'0' + n) as char);
    s.shrink_to_fit();
    s
}

However, as you already noticed in #82576 (comment), it can no longer tell that the Vec won't grow once you add the fourth push. Could you open an issue for that if there isn't one already?

The question is now: Do we want to push a simpler, safe version, in the hopes that the compiler will better optimize it in the future? Or do we want to push the more complex, unsafe version and hopefully remember to replace it once the compiler can deal with the simpler one?

In any case I've left some minor comments on the current code.

library/alloc/src/string.rs

LingMan · 2021-03-08T12:56:45Z

Looks like the String-based version optimizes fine when MIR optimizations are disabled, which suggests that it's the same problem as #82801.
(Dropped the shrink_to_fit, since it generates quite a bit of assembly and wasn't a fair comparison.)

LingMan · 2021-03-11T18:52:47Z

Just to make sure there's no misunderstanding: It's not my approval you need to land this. I'm just a random dude without any power in that regard.
Maybe update the PR's title now that it covers i8 in addition to u8. Can't comment on the stability attributes. The code looks fine to me.

That said, it would be cool if we could do this and have the compiler optimize it well:

impl ToString for i8 {
    #[inline]
    fn to_string(&self) -> String {
        let mut buf = String::with_capacity(4);
        if self.is_negative() {
            buf.push('-');
        }
        buf.push_str(&self.unsigned_abs().to_string());
        buf
    }
}

Alas, it can't do that (yet?).

gilescope · 2021-03-12T06:58:11Z

@LingMan thank you for all your help on this PR, it's been really appreciated! Frankly I think to_string() is technically a little broken. Ideally something like write_to_string(&mut String) would be better. That way you could do something like you suggest above nicely but as it is that would be two allocations.

gilescope · 2021-03-20T18:24:24Z

@Mark-Simulacrum this one's ready for review at your leasure.

JohnTitor · 2021-04-26T10:29:00Z

r? @Amanieu

gilescope · 2021-04-26T13:51:46Z

Knowing what I have recently learned I can probably shave a bit off these timings by pulling the adds a little earlier in the function. Give me an evening to revisit this.

gilescope · 2021-04-26T22:40:31Z

No maybe not. I think that's as good as it gets.

Amanieu · 2021-04-27T13:39:10Z

That's great! Do you think you could implement this for other integer sizes too?

gilescope · 2021-04-30T06:21:05Z

Idk, the way the larger integer types are currently handled makes sense - the two at a time digits makes sense - it's just at the short end that the general implemention seems overkill. Mostly I'm depressed that anything done here is dwarfed by the alloc that we can't ever avoid due to the signature. Rather I'm focusing on the other end - parsing ints which isn't bounded by allocation times.

Amanieu · 2021-05-02T20:11:23Z

@bors r+

bors · 2021-05-02T20:11:24Z

📌 Commit 05330aa has been approved by Amanieu

bors · 2021-05-02T22:02:01Z

⌛ Testing commit 05330aa with merge 8a8ed07...

bors · 2021-05-03T00:17:10Z

☀️ Test successful - checks-actions
Approved by: Amanieu
Pushing 8a8ed07 to master...

pickfire · 2021-05-12T10:38:54Z

library/alloc/src/string.rs

@@ -2224,6 +2224,47 @@ impl ToString for char {
    }
 }

+#[stable(feature = "u8_to_string_specialization", since = "1.999.0")]


What is with the since tag? Why 1.999?

Already fixed on master.

Ah sorry, wasn't sure what to put for that. Yes 1.54 makes sense, but at the time I didn't want to presume so put something a little further out.

u8::to_string() specialisation (far less asm).

a69960a

rust-highfive assigned Mark-Simulacrum Feb 27, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 27, 2021

This comment has been minimized.

Sign in to view

rust-highfive assigned KodrAus and unassigned Mark-Simulacrum Mar 1, 2021

This comment has been minimized.

Sign in to view

gilescope force-pushed the to_string branch from 91c614a to 92de76f Compare March 4, 2021 08:16

Alternative LUT rather than dividing.

d07c43a

gilescope force-pushed the to_string branch from 92de76f to d07c43a Compare March 4, 2021 08:36

LingMan reviewed Mar 4, 2021

View reviewed changes

library/alloc/src/string.rs Outdated Show resolved Hide resolved

less uB in i8

a678b9a

LingMan suggested changes Mar 5, 2021

View reviewed changes

vec![0;4] is a fast path.

e83378b

After much tweaking found a way to get similar asm size as the u8 to_string implementation.

LingMan suggested changes Mar 8, 2021

View reviewed changes

library/alloc/src/string.rs Outdated Show resolved Hide resolved

library/alloc/src/string.rs Outdated Show resolved Hide resolved

gilescope changed the title ~~u8::to_string() specialisation (far less asm).~~ i8 and u8::to_string() specialisation (far less asm). Mar 12, 2021

JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 26, 2021

rust-highfive assigned Amanieu and unassigned KodrAus Apr 26, 2021

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 2, 2021

bors added the merged-by-bors This PR was explicitly merged by bors. label May 3, 2021

bors merged commit 8a8ed07 into rust-lang:master May 3, 2021

rustbot added this to the 1.54.0 milestone May 3, 2021

JohnTitor mentioned this pull request May 3, 2021

Fix stability attributes of byte-to-string specialization #84858

Merged

pickfire reviewed May 12, 2021

View reviewed changes

DaniPopes mentioned this pull request May 6, 2023

Specialize ToString implementation for fmt::Arguments #111168

Merged

i8 and u8::to_string() specialisation (far less asm). #82576

i8 and u8::to_string() specialisation (far less asm). #82576

Uh oh!

Conversation

gilescope commented Feb 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rust-highfive commented Feb 27, 2021

Uh oh!

This comment has been minimized.

gilescope commented Feb 27, 2021

Uh oh!

SkiFire13 commented Feb 27, 2021

Uh oh!

the8472 commented Feb 27, 2021

Uh oh!

gilescope commented Feb 27, 2021

Uh oh!

SkiFire13 commented Feb 27, 2021

Uh oh!

Mark-Simulacrum commented Mar 1, 2021

Uh oh!

gilescope commented Mar 2, 2021

Uh oh!

gilescope commented Mar 2, 2021

Uh oh!

Mark-Simulacrum commented Mar 2, 2021

Uh oh!

gilescope commented Mar 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

Uh oh!

gilescope commented Mar 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gilescope commented Mar 7, 2021

Uh oh!

the8472 commented Mar 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LingMan left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

LingMan commented Mar 8, 2021

Uh oh!

LingMan commented Mar 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gilescope commented Mar 12, 2021

Uh oh!

gilescope commented Mar 20, 2021

Uh oh!

JohnTitor commented Apr 26, 2021

Uh oh!

gilescope commented Apr 26, 2021

Uh oh!

gilescope commented Apr 26, 2021

Uh oh!

Amanieu commented Apr 27, 2021

Uh oh!

gilescope commented Apr 30, 2021

Uh oh!

Amanieu commented May 2, 2021

Uh oh!

bors commented May 2, 2021

Uh oh!

bors commented May 2, 2021

Uh oh!

bors commented May 3, 2021

Uh oh!

pickfire May 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

gilescope commented Feb 27, 2021 •

edited

Loading

gilescope commented Mar 3, 2021 •

edited

Loading

gilescope commented Mar 4, 2021 •

edited

Loading

the8472 commented Mar 7, 2021 •

edited

Loading

LingMan left a comment •

edited

Loading

LingMan commented Mar 11, 2021 •

edited

Loading

pickfire May 12, 2021 •

edited

Loading