Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: ~15% Faster str to int parsing (when base 10) #83088

Closed
wants to merge 3 commits into from

Conversation

gilescope
Copy link
Contributor

I saw a c++ talk on numbers and wondered if the same trick would work here.

Rather than multiplying the same number each time, if we're multiplying different numbers then the ALU can reorder them (as they have fewer data dependencies and add is commutative) and do more of them in parallel.

If radix is called with a constant (which it is 99% of the time) I'm pretty sure that llvm evaluates the if radix == 10 at compile time - I noticed that when radix == 16 llvm can figure out that it can shift rather than multiply! Pretty impressive in my book that it can spot that!

Speedup wise I'm seeing it go from 1.4ns to 1.2ns. Not earth shattering, but not to be sniffed at either.

… here.

Rather than multiplying the same number each time, if we're multiplying different numbers then the ALU can reorder them (as they have fewer data dependencies) and do more of them in parallel.

If radix is called with a constant (which it is 99% of the time) I'm pretty sure that llvm evaluates the if at compile time as to which algorythm to pick. I noticed that when radix == 16 it can figure out that it can shift rather than multiply which is pretty impressive in my book that it can spot that!

Speedup wise I'm seeing it go from 1.4ns to 1.2ns. Not earth shattering, but not to be sniffed at either.
@rust-highfive
Copy link
Collaborator

r? @Mark-Simulacrum

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 13, 2021
@rust-log-analyzer

This comment has been minimized.

@jyn514 jyn514 added I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Mar 13, 2021
@gilescope gilescope changed the title ~15% Faster str to int parsing (when base 10) WIP: ~15% Faster str to int parsing (when base 10) Mar 15, 2021
@rust-log-analyzer

This comment has been minimized.

@gilescope
Copy link
Contributor Author

If it's too specific being constrained to u32 sized and below for the std lib I understand. I can always do this as a separate crate.

@DutchGhost
Copy link
Contributor

DutchGhost commented Mar 15, 2021

This comes from a talk from Andrei Alexandrescu, fastware (timestamp important)

In the past I played around with it, and implemented this parsing technique for each integer type with macros as seen here.

and even made a const fn version of it: https://gist.github.com/DutchGhost/d8604a3c796479777fe9f5e25d855cfd

I dont remember my bench results exactly, but the speed improvement was not all that much, but maybe that was just my implementation. It's a neat parsing technique :)

@gilescope
Copy link
Contributor Author

gilescope commented Mar 15, 2021 via email

@joshtriplett
Copy link
Member

This seems like a reasonable optimization, if it ends up providing an improvement, and passes all tests.

Could you please address the test failures, rename the multiplier array to something like DECIMAL_MULTIPLIERS, and expand it to work up to u64?

@rust-log-analyzer
Copy link
Collaborator

The job x86_64-gnu-llvm-9 failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
Suite("src/test/assembly") not skipped for "bootstrap::test::Assembly" -- not in ["src/tools/tidy"]
Check compiletest suite=assembly mode=assembly (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)

running 31 tests
iiiiiiiiiiiiiiiiiiiiiiiiiiiii..
Suite("src/test/incremental") not skipped for "bootstrap::test::Incremental" -- not in ["src/tools/tidy"]

 finished in 0.103 seconds
Check compiletest suite=incremental mode=incremental (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
---
Suite("src/test/debuginfo") not skipped for "bootstrap::test::Debuginfo" -- not in ["src/tools/tidy"]
Check compiletest suite=debuginfo mode=debuginfo (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)

running 116 tests
iiiiiiiiii.i.i...i.i..ii....i.i....ii..........iiii.........i.....i...i.......ii.i.ii......iiii....i 100/116
test result: ok. 78 passed; 0 failed; 38 ignored; 0 measured; 0 filtered out; finished in 2.22s

 finished in 2.286 seconds
Suite("src/test/ui-fulldeps") not skipped for "bootstrap::test::UiFullDeps" -- not in ["src/tools/tidy"]
---
.................................................................................................... 1200/1248
................................................
failures:

---- num::from_str_issue7588 stdout ----
thread 'num::from_str_issue7588' panicked at 'assertion failed: `(left == right)`
  left: `Some(232)`,
 right: `None`', library/core/tests/num/mod.rs:87:5
---- num::test_invalid stdout ----
---- num::test_invalid stdout ----
thread 'num::test_invalid' panicked at 'assertion failed: `(left == right)`
  left: `Err(PosOverflow)`,
 right: `Err(InvalidDigit)`', library/core/tests/num/mod.rs:81:5

failures:
error: test failed, to rerun pass '-p core --test coretests'
    num::from_str_issue7588
    num::from_str_issue7588
    num::test_invalid

test result: FAILED. 1244 passed; 2 failed; 2 ignored; 0 measured; 0 filtered out; finished in 5.51s



command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "test" "--target" "x86_64-unknown-linux-gnu" "-Zbinary-dep-depinfo" "-j" "16" "--release" "--locked" "--color" "always" "--features" "panic-unwind backtrace compiler-builtins-c" "--manifest-path" "/checkout/library/test/Cargo.toml" "-p" "core" "--" "--quiet"


failed to run: /checkout/obj/build/bootstrap/debug/bootstrap --stage 2 test --exclude src/tools/tidy
Build completed unsuccessfully in 0:16:47

@gilescope
Copy link
Contributor Author

I think the other PR has more effect and is broader in scope than this PR.

Comment on lines +857 to +890
if is_positive {
// The number is positive
for &c in digits {
let x = match (c as char).to_digit(radix) {
Some(x) => x,
None => return Err(PIE { kind: InvalidDigit }),
};
let x = match factors[idx].checked_mul(x) {
Some(result) => result,
None => return Err(PIE { kind: PosOverflow }),
};
result = match result.checked_add(x) {
Some(result) => result,
None => return Err(PIE { kind: PosOverflow }),
};
idx += 1;
}
} else {
// The number is negative
for &c in digits {
let x = match (c as char).to_digit(radix) {
Some(x) => x,
None => return Err(PIE { kind: InvalidDigit }),
};
let x = match factors[idx].checked_mul(x) {
Some(result) => result,
None => return Err(PIE { kind: NegOverflow }),
};
result = match result.checked_sub(x) {
Some(result) => result,
None => return Err(PIE { kind: NegOverflow }),
};
idx += 1;
}
Copy link
Contributor

@pickfire pickfire Mar 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to have so many duplicated code here? Also, why not put the extra condition branch in the error flow rather the default flow?

Maybe we can use iterator for idx here?

Suggested change
if is_positive {
// The number is positive
for &c in digits {
let x = match (c as char).to_digit(radix) {
Some(x) => x,
None => return Err(PIE { kind: InvalidDigit }),
};
let x = match factors[idx].checked_mul(x) {
Some(result) => result,
None => return Err(PIE { kind: PosOverflow }),
};
result = match result.checked_add(x) {
Some(result) => result,
None => return Err(PIE { kind: PosOverflow }),
};
idx += 1;
}
} else {
// The number is negative
for &c in digits {
let x = match (c as char).to_digit(radix) {
Some(x) => x,
None => return Err(PIE { kind: InvalidDigit }),
};
let x = match factors[idx].checked_mul(x) {
Some(result) => result,
None => return Err(PIE { kind: NegOverflow }),
};
result = match result.checked_sub(x) {
Some(result) => result,
None => return Err(PIE { kind: NegOverflow }),
};
idx += 1;
}
for &c in digits {
let x = match (c as char).to_digit(radix) {
Some(x) => x,
None => return Err(PIE { kind: InvalidDigit }),
};
let x = match factors[idx].checked_mul(x) {
Some(result) => result,
None => return Err(PIE { kind: if is_positive { PosOverflow } else { NegOverflow } }),
};
result = match result.checked_add(x) {
Some(result) => result,
None => return Err(PIE { kind: if is_positive { PosOverflow } else { NegOverflow } }),
};
idx += 1;

Copy link
Contributor Author

@gilescope gilescope Mar 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you missed there was a checked_sub in there? Having an if in the loop to choose between checked_add and checked_sub kills performance - I can attest to that! I did wonder if we could do better too. You can't add and then negate as the negative ranges are slightly bigger. So you could always subtract and then !+1 to make positive, which would work great for signed types (with a check to see if val == T::Min) but I couldn't get my head around what would happen if I was doing subtractions to unsigned types - if I have to do unsigned types a different way then I've not made anything any simpler. So I can't think of a way to simplify the code, but am willing to try anything if you have any other ideas.

Copy link
Contributor

@pickfire pickfire Mar 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I did missed that.

But what if we put the if branch only for checked_add and checked_sub but keep the to_digit and checked_mul?

Another idea, can you also try using if it's negative after the for loop to do checked_neg there? Remove the for loop and add a branch to convert the positive number to negative using checked_neg and return NegOverflow if there are issues so we don't need another loop for negative, a single branch.

While trying that, maybe can also try a hand crafted negation instead of checked_add? Like this

    for &c in digits {
        ... // checked_add ...
    }
    if !is_positive {
        const UNSIGNED_BITS: i8 = (i8::MAX as u8 * 2) as i8 + 1; // don't just use i8, can use whatever type it is
        result = (i ^ UNSIGNED_BITS).checked_add(1);
    }

One more thing, I don't think we need the whole bunch of negative calculation in unsigned code do we?

If would be better if you have a repo to try out stuff on the benchmarking, like the one I did https://github.com/pickfire/parseint so that I can also try to help out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For i8 if we're parsing "-128" we run into representation trouble if we start positive and then negate, that's why one would want to do subtractions and then neg. But we'd need a different code path for unsigned...

I don't think we need the whole bunch of negative calculation in unsigned code do we?

Hmm, we could do if is_positive || !is_signed_ty { rather than if is_positive - that way the compiler should remove the else block for unsigned types (I don't think currently the compiler can realise that is_positive is always true for unsigned types, but it's pretty crafty so it might have figured that out anyhows).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fork your parseint and add some benchmarks. (The trick is very cool you've got there btw.)

I need to double check my benchmarks anyhow because I reaslised that my blackbox was not quite in the right place and the compiler was optimising for the data (rookie mistake!).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you link your godbolt experiments?

Copy link
Contributor Author

@gilescope gilescope Mar 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code in the godbolt is an exploration of expanding to larger type sizes than u32 but it's slow due to an if in the loop. Nevertheless it demostrates the is_signed_ty evaluation nicely.

Sure, on the left it's a const that evaluates to false and on the right it's an expression. https://godbolt.org/z/od3ajTh3v
Interestingly on the left side if I replace
const is_signed_ty: bool = false; with
let is_signed_ty: bool = false;
it can't tell that it should drop the else expression which is surprising. I would have thought it should have proporgated the constant into the is_signed_ty given it was assigned by a constant and is not mutable.

Copy link
Contributor

@LingMan LingMan Mar 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That godbolt link shows unoptimized assembly. You forgot to set -Copt-level=3. With that the assembly is identical between the two.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or -O

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pickfire I took a stab at building in all the safeguards to your trick as an alternative route for radix 10. Have submitted a PR for us to talk around: pickfire/parseint#2

Comment on lines +860 to +863
let x = match (c as char).to_digit(radix) {
Some(x) => x,
None => return Err(PIE { kind: InvalidDigit }),
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These matches can all be written a bit shorter like this:

Suggested change
let x = match (c as char).to_digit(radix) {
Some(x) => x,
None => return Err(PIE { kind: InvalidDigit }),
};
let x = (c as char).to_digit(radix).ok_or(PIE { kind: InvalidDigit })?;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall we used to do these but in performance sensitive parts using ok_or is slower than match counterparts, from what I recall.

Copy link
Contributor

@LingMan LingMan Mar 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really? The optimizer should have all information it needs and in my quick tests the generated assembly is identical. Is there an open issue or do you have a test case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's what I recall but not sure if the same goes for this case, if the assembly is the same then I guess it's good to have it.

Some(result) => result,
None => return Err(PIE { kind: PosOverflow }),
};
idx += 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to manage an index manually if you loop over the factors directly:

for (&c, &factor) in digits.zip(factors) { ... }

Assuming zip doesn't inhibit any optimizations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both zip and enumerate inhibited optimisations - I like that euphamism.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen some weird interactions between zip and rev, which currently(?) cause suboptimal assembly, but never with enumerate. Is this another case of missing -Copt-level=3 on godbolt?

PS: No euphemism intended.

Comment on lines +857 to +890
if is_positive {
// The number is positive
for &c in digits {
let x = match (c as char).to_digit(radix) {
Some(x) => x,
None => return Err(PIE { kind: InvalidDigit }),
};
let x = match factors[idx].checked_mul(x) {
Some(result) => result,
None => return Err(PIE { kind: PosOverflow }),
};
result = match result.checked_add(x) {
Some(result) => result,
None => return Err(PIE { kind: PosOverflow }),
};
idx += 1;
}
} else {
// The number is negative
for &c in digits {
let x = match (c as char).to_digit(radix) {
Some(x) => x,
None => return Err(PIE { kind: InvalidDigit }),
};
let x = match factors[idx].checked_mul(x) {
Some(result) => result,
None => return Err(PIE { kind: NegOverflow }),
};
result = match result.checked_sub(x) {
Some(result) => result,
None => return Err(PIE { kind: NegOverflow }),
};
idx += 1;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could move that for loop into a closure or macro or something that gets PosOverflow/NegOverflow and checked_add/checked_sub as parameters.

@Mark-Simulacrum
Copy link
Member

IIRC @dtolnay has done some work around this in the ecosystem, do you think you'd be interested in reviewing this PR?

Otherwise I'll try to take a look soon.

@gilescope
Copy link
Contributor Author

I am thinking of closing down this PR in favour of #83371

I am not sure that the technique in this PR has benefits scailed up to u64. I am looking to see if I can safely weaponise pickire's tricks at radix 10 parsing, but for now I think that should live in a separate atoi_radix10 crate where it can get proved out.

@pickfire
Copy link
Contributor

pickfire commented Apr 6, 2021

I am not sure that the technique in this PR has benefits scailed up to u64. I am looking to see if I can safely weaponise pickire's tricks at radix 10 parsing, but for now I think that should live in a separate atoi_radix10 crate where it can get proved out.

Try not to publish a crate (so we don't occupy a name, like what I did to ve), also can just ping me for review, I would be happy to do that.

@dtolnay
Copy link
Member

dtolnay commented Apr 13, 2021

IIRC @dtolnay has done some work around this in the ecosystem, do you think you'd be interested in reviewing this PR?

I have done float parsing, and I have done int and float printing, but so far not int parsing. Would love if someone else were able to spearhead the review on this one.

@pickfire
Copy link
Contributor

pickfire commented Apr 14, 2021

I have done porting a verify specific version of int parsing from c++. I believe @gilescope is currently experimenting to get a faster parser by having another repository to test his ideas out. @gilescope you may want to change this pull request to draft in the meantime. I will also take a look at it after that since we could have a faster way of testing out the new changes.

@gilescope
Copy link
Contributor Author

gilescope commented Apr 14, 2021 via email

@gilescope gilescope marked this pull request as draft April 15, 2021 08:52
@Mark-Simulacrum Mark-Simulacrum added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 26, 2021
@gilescope
Copy link
Contributor Author

I'm going to close this for now to reduce confusion. I am working on base10, but I don't know when I'm going to be done done (there always seems to be some new idea to try). It's also a way off being generic so not that suitable for inclusion into std lib yet (if ever as there's quite a bit more code) but I do want to answer the question of what's the fastest way we can parse ints.

Meanwhile I think we should move forward with #83371 - there's no downside except a little more code and it speeds up all radix.

@gilescope gilescope closed this May 3, 2021
@gilescope
Copy link
Contributor Author

I did finally get to a generic way of parsing base 10 strings fast and published it as atoi_radix10 crate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants