Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Int parsing optimisations (part 2) #96071

Closed
wants to merge 2 commits into from
Closed

Conversation

gilescope
Copy link
Contributor

Extension to #95399

We can combine the src.is_empty() check with the is_positive check so that we get the first element once.

Previously we started with let mut result = T::from_u32(0); which we would then call result = result * T::from_u32(radix); on which we already know will be 0. Instead if we parse the first digit and put that in the result then the first time around the loop the mul will be productive - we just need to shave the first element from the digits slice that we hand to the loop.

Give that the loop is now only going round twice for u8 it's not worth trying to do any further optimisations - let's only do that for u32 size and above where we could be iterating a few times (if mem::size_of::<T>() > 2 {).

The final observation is that we can use the unchecked path even for strings that are large enough to overflow - we just use the checked path for parsing the digits that could breach the type.

I've included u128/i128 in the benchmarks. The checked arithmetic of i128 is particularly slow and really gains from using the unchecked arithmetic where possible (not that the current benchmarks show this as they are parsing too small numbers).

@rust-highfive
Copy link
Collaborator

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with r? rust-lang/libs-api @rustbot label +T-libs-api to request review from a libs-api team reviewer. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • a stabilization of a library feature
  • introducing new or changes existing unstable library APIs
  • changes to public documentation in ways that create new stability guarantees

@rustbot rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Apr 15, 2022
@rust-highfive
Copy link
Collaborator

r? @kennytm

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 15, 2022
@gilescope
Copy link
Contributor Author

r? @scottmcm

@rust-highfive rust-highfive assigned scottmcm and unassigned kennytm Apr 15, 2022
@rust-log-analyzer
Copy link
Collaborator

The job mingw-check failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
configure: rust.debug-assertions := True
configure: rust.overflow-checks := True
configure: llvm.assertions      := True
configure: dist.missing-tools   := True
configure: build.configure-args := ['--enable-sccache', '--disable-manage-submodu ...
configure: writing `config.toml` in current directory
configure: 
configure: run `python /checkout/x.py --help`
configure: 

return Err(PIE { kind: InvalidDigit });
let (first, mut digits) = (*src.get(0).ok_or_else(|| PIE { kind: Empty })?, &src[1..]);

let (is_positive, mut result) = match first {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there's a bit more repetition going on here than there needs to be.

Maybe try this with slice patterns, or something? I'm imagining something like

let (is_positive, digits) = match src {
    [b'-', d] => (false, d),
    [b'+', d] => (true, d),
    d => (true, d),
};

To hopefully simplify a bit of the first/digits/result dance that's currently happening.

@scottmcm scottmcm added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 6, 2022
// If the len of the str is short compared to the range of the type
// we are parsing into, then we can be certain that an overflow will not occur.
// This bound is when `radix.pow(digits.len()) - 1 <= T::MAX` but the condition
// above is a faster (conservative) approximation of this.
// in `safe_width` is a faster (conservative) approximation of this.
//
// Consider radix 16 as it has the highest information density per digit and will thus overflow the earliest:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: this comment is useful information, but doesn't seem like it belongs here, since the computation it's talking about here isn't here. Maybe put it in on/in safe_width instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Said otherwise, this code will be correct as long as safe_width is correct, so the details of which approach -- faster or tighter -- doesn't really matter here.

//
// Consider radix 16 as it has the highest information density per digit and will thus overflow the earliest:
// `u8::MAX` is `ff` - any str of len 2 is guaranteed to not overflow.
// `i8::MAX` is `7f` - only a str of len 1 is guaranteed to not overflow.
let safe_width = safe_width::<T>(radix, is_signed_ty);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: the .take in one spot not coupled with a corresponding .skip in the other makes this read a bit strangely to me. Perhaps the splitting could just be put here, with no need to ever look at the length again later? As a first thought, something like this, with appropriate updates to the for loops?

Suggested change
let (safe_digits, risky_digits) = if safe_width > digits.len() { (digits, &[]) } else { digits.split_at(safe_width) };

@@ -126,15 +126,15 @@ fn test_can_not_overflow() {
where
T: std::convert::TryFrom<i8>,
{
!can_not_overflow::<T>(radix, T::try_from(-1_i8).is_ok(), input.as_bytes())
safe_width::<T>(radix, T::try_from(-1_i8).is_ok()) < input.len()
}

// Positive tests:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: how about testing the output of safe_width directly? Just seeing can_overflow returning true doesn't mean that it's correct -- it could be returning usize::MAX.

Copy link
Member

@scottmcm scottmcm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea here -- avoiding the cliff once the string gets longer than the threshold -- but I have a bunch of implementation thoughts.

Feel free to push back if some of them turn out to be bad ideas.

@JohnCSimon
Copy link
Member

Ping from triage:
@gilescope what is the status of this PR? Looks like it hasn't been touched in a while.

@gilescope
Copy link
Contributor Author

gilescope commented Nov 27, 2022 via email

@Dylan-DPC
Copy link
Member

Closing this as inactive. Feel free to reöpen this pr or create a new pr if you get the time to work on this. Thanks

@Dylan-DPC Dylan-DPC closed this May 13, 2023
@Dylan-DPC Dylan-DPC added S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels May 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants