Int parsing optimisations (part 2) #96071

gilescope · 2022-04-15T09:35:11Z

Extension to #95399

We can combine the src.is_empty() check with the is_positive check so that we get the first element once.

Previously we started with let mut result = T::from_u32(0); which we would then call result = result * T::from_u32(radix); on which we already know will be 0. Instead if we parse the first digit and put that in the result then the first time around the loop the mul will be productive - we just need to shave the first element from the digits slice that we hand to the loop.

Give that the loop is now only going round twice for u8 it's not worth trying to do any further optimisations - let's only do that for u32 size and above where we could be iterating a few times (if mem::size_of::<T>() > 2 {).

The final observation is that we can use the unchecked path even for strings that are large enough to overflow - we just use the checked path for parsing the digits that could breach the type.

I've included u128/i128 in the benchmarks. The checked arithmetic of i128 is particularly slow and really gains from using the unchecked arithmetic where possible (not that the current benchmarks show this as they are parsing too small numbers).

rust-highfive · 2022-04-15T09:35:13Z

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with r? rust-lang/libs-api @rustbot label +T-libs-api to request review from a libs-api team reviewer. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

a stabilization of a library feature
introducing new or changes existing unstable library APIs
changes to public documentation in ways that create new stability guarantees

rust-highfive · 2022-04-15T09:35:15Z

r? @kennytm

(rust-highfive has picked a reviewer for you, use r? to override)

gilescope · 2022-04-15T11:52:32Z

r? @scottmcm

rust-log-analyzer · 2022-04-15T13:06:42Z

The job mingw-check failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

configure: rust.debug-assertions := True
configure: rust.overflow-checks := True
configure: llvm.assertions      := True
configure: dist.missing-tools   := True
configure: build.configure-args := ['--enable-sccache', '--disable-manage-submodu ...
configure: writing `config.toml` in current directory
configure: 
configure: run `python /checkout/x.py --help`
configure:

library/core/src/num/mod.rs

scottmcm · 2022-04-25T01:41:44Z

library/core/src/num/mod.rs

-            return Err(PIE { kind: InvalidDigit });
+    let (first, mut digits) = (*src.get(0).ok_or_else(|| PIE { kind: Empty })?, &src[1..]);
+
+    let (is_positive, mut result) = match first {


It looks like there's a bit more repetition going on here than there needs to be.

Maybe try this with slice patterns, or something? I'm imagining something like

let (is_positive, digits) = match src { [b'-', d] => (false, d), [b'+', d] => (true, d), d => (true, d), };

To hopefully simplify a bit of the first/digits/result dance that's currently happening.

scottmcm · 2022-05-06T01:28:58Z

library/core/src/num/mod.rs

        // If the len of the str is short compared to the range of the type
        // we are parsing into, then we can be certain that an overflow will not occur.
        // This bound is when `radix.pow(digits.len()) - 1 <= T::MAX` but the condition
-        // above is a faster (conservative) approximation of this.
+        // in `safe_width` is a faster (conservative) approximation of this.
        //
        // Consider radix 16 as it has the highest information density per digit and will thus overflow the earliest:


suggestion: this comment is useful information, but doesn't seem like it belongs here, since the computation it's talking about here isn't here. Maybe put it in on/in safe_width instead?

Said otherwise, this code will be correct as long as safe_width is correct, so the details of which approach -- faster or tighter -- doesn't really matter here.

scottmcm · 2022-05-06T01:35:20Z

library/core/src/num/mod.rs

        //
        // Consider radix 16 as it has the highest information density per digit and will thus overflow the earliest:
        // `u8::MAX` is `ff` - any str of len 2 is guaranteed to not overflow.
        // `i8::MAX` is `7f` - only a str of len 1 is guaranteed to not overflow.
+        let safe_width = safe_width::<T>(radix, is_signed_ty);
+


suggestion: the .take in one spot not coupled with a corresponding .skip in the other makes this read a bit strangely to me. Perhaps the splitting could just be put here, with no need to ever look at the length again later? As a first thought, something like this, with appropriate updates to the for loops?

Suggested change

let (safe_digits, risky_digits) = if safe_width > digits.len() { (digits, &[]) } else { digits.split_at(safe_width) };

scottmcm · 2022-05-06T01:36:47Z

library/core/tests/num/mod.rs

@@ -126,15 +126,15 @@ fn test_can_not_overflow() {
    where
        T: std::convert::TryFrom<i8>,
    {
-        !can_not_overflow::<T>(radix, T::try_from(-1_i8).is_ok(), input.as_bytes())
+        safe_width::<T>(radix, T::try_from(-1_i8).is_ok()) < input.len()
    }

    // Positive tests:


suggestion: how about testing the output of safe_width directly? Just seeing can_overflow returning true doesn't mean that it's correct -- it could be returning usize::MAX.

scottmcm

I like the idea here -- avoiding the cliff once the string gets longer than the threshold -- but I have a bunch of implementation thoughts.

Feel free to push back if some of them turn out to be bad ideas.

JohnCSimon · 2022-11-27T04:18:18Z

Ping from triage:
@gilescope what is the status of this PR? Looks like it hasn't been touched in a while.

gilescope · 2022-11-27T05:46:31Z

Still on the todo list. Will ship it xmas.

…

On Sun, 27 Nov 2022 at 04:18, John Simon ***@***.***> wrote: Ping from triage: @gilescope <https://github.com/gilescope> what is the status of this PR? Looks like it hasn't been touched in a while. — Reply to this email directly, view it on GitHub <#96071 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGEJCFFB5LWZDHBLU4BDD3WKLOJNANCNFSM5TQBKCJQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Dylan-DPC · 2023-05-13T06:36:07Z

Closing this as inactive. Feel free to reöpen this pr or create a new pr if you get the time to work on this. Thanks

Always use unchecked path (for larger types)

cb85940

rust-highfive assigned kennytm Apr 15, 2022

rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Apr 15, 2022

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 15, 2022

rust-highfive assigned scottmcm and unassigned kennytm Apr 15, 2022

Merge branch 'master' into plan_c

01fd512

pickfire reviewed Apr 16, 2022

View reviewed changes

library/core/src/num/mod.rs Show resolved Hide resolved

scottmcm reviewed Apr 25, 2022

View reviewed changes

scottmcm added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 6, 2022

scottmcm reviewed May 6, 2022

View reviewed changes

scottmcm requested changes May 6, 2022

View reviewed changes

Dylan-DPC closed this May 13, 2023

Dylan-DPC added S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels May 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Int parsing optimisations (part 2) #96071

Int parsing optimisations (part 2) #96071

Uh oh!

gilescope commented Apr 15, 2022

Uh oh!

rust-highfive commented Apr 15, 2022

Uh oh!

rust-highfive commented Apr 15, 2022

Uh oh!

gilescope commented Apr 15, 2022

Uh oh!

rust-log-analyzer commented Apr 15, 2022

Uh oh!

Uh oh!

scottmcm Apr 25, 2022

Uh oh!

scottmcm May 6, 2022

Uh oh!

scottmcm May 6, 2022

Uh oh!

scottmcm May 6, 2022

Uh oh!

scottmcm May 6, 2022

Uh oh!

scottmcm left a comment •

edited

Loading

Uh oh!

JohnCSimon commented Nov 27, 2022

Uh oh!

gilescope commented Nov 27, 2022 via email

Uh oh!

Dylan-DPC commented May 13, 2023

Uh oh!

Uh oh!


	let (safe_digits, risky_digits) = if safe_width > digits.len() { (digits, &[]) } else { digits.split_at(safe_width) };

Int parsing optimisations (part 2) #96071

Int parsing optimisations (part 2) #96071

Uh oh!

Conversation

gilescope commented Apr 15, 2022

Uh oh!

rust-highfive commented Apr 15, 2022

Uh oh!

rust-highfive commented Apr 15, 2022

Uh oh!

gilescope commented Apr 15, 2022

Uh oh!

rust-log-analyzer commented Apr 15, 2022

Uh oh!

Uh oh!

scottmcm Apr 25, 2022

Choose a reason for hiding this comment

Uh oh!

scottmcm May 6, 2022

Choose a reason for hiding this comment

Uh oh!

scottmcm May 6, 2022

Choose a reason for hiding this comment

Uh oh!

scottmcm May 6, 2022

Choose a reason for hiding this comment

Uh oh!

scottmcm May 6, 2022

Choose a reason for hiding this comment

Uh oh!

scottmcm left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JohnCSimon commented Nov 27, 2022

Uh oh!

gilescope commented Nov 27, 2022 via email

Uh oh!

Dylan-DPC commented May 13, 2023

Uh oh!

Uh oh!

scottmcm left a comment •

edited

Loading