Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast path for parsing decimal literals #3288

Closed
overlookmotel opened this issue May 15, 2024 · 9 comments · Fixed by #4257
Closed

Fast path for parsing decimal literals #3288

overlookmotel opened this issue May 15, 2024 · 9 comments · Fixed by #4257
Assignees
Labels
A-parser Area - Parser C-performance Category - Solution not expected to change functional behavior, only performance

Comments

@overlookmotel
Copy link
Contributor

overlookmotel commented May 15, 2024

#3283 made an optimization to the parser to avoid checking for _s in numeric literals twice, and got a nice 1% speed-up on parser benchmarks.

It occurs to me that we could repeat the trick for the common case of simple decimal literals (e.g. 0, 1, 123). str::parse::<f64>() is quite a complicated function which checks for all kinds of things (exponent, decimal point, valid input etc). All of this is repeated work, since we've already determined in the lexer what the input is.

Perhaps we could have a flag which lexer sets if the input is a simple case, and then use a hand-rolled parse_simple_decimal function to convert to f64, like the ones we have for binary and octal?

fn parse_octal(s: &str) -> f64 {
debug_assert!(!s.is_empty());
let mut result = 0_f64;
for c in s.as_bytes().iter().filter(|s| s != &&b'_') {
#[allow(clippy::cast_lossless)]
let value = (c - b'0') as f64;
result = result.mul_add(8.0, value);
}
result
}

(we should base implementation on std's str::parse::<f64> as it's probably very optimized already, but just cut out all the extraneous checks)

Given that #3283 got a 1% speed-up, maybe this can get the same again, or maybe more?

@DonIsaac Would you be interested in investigating?

@overlookmotel overlookmotel added the C-performance Category - Solution not expected to change functional behavior, only performance label May 15, 2024
@Boshen Boshen removed their assignment May 15, 2024
@overlookmotel
Copy link
Contributor Author

overlookmotel commented May 15, 2024

As I mentioned in #3289, it's mysterious why #3283 got a 5% speed bump on lexer benchmarks.

Before we test out what I'm suggesting in this issue, we should investigate that further. If it turns out that converting the padding bytes in Token from u8 to bool is a speed-gain in itself, we should do that first, so then we can measure the impact of the change described here in isolation (as it will also involve using one of those padding bytes as a bool).

@overlookmotel overlookmotel added the A-parser Area - Parser label May 15, 2024
@DonIsaac DonIsaac self-assigned this May 15, 2024
@DonIsaac
Copy link
Contributor

Just did something similar for non-decimals in #3296. Decimals are up next

@DonIsaac
Copy link
Contributor

@overlookmotel I discovered that Kind::Decimals are never floating point numbers. However, when I tried parsing them as an i64 then casting the result to an f64, I found no performance improvements. However, my local benchmarking is not always accurate, so I'll make a small PR so we can see bench results from CI

@Boshen Boshen assigned DonIsaac and unassigned DonIsaac Jun 26, 2024
@Boshen
Copy link
Member

Boshen commented Jul 14, 2024

Is this resolved?

@overlookmotel
Copy link
Contributor Author

I don't believe so.

However, when I tried parsing them as an i64 then casting the result to an f64, I found no performance improvements.

@DonIsaac Did you try parsing to a u64 (not i64)? Or maybe i64 was a typo?

@DonIsaac
Copy link
Contributor

I don't believe so.

However, when I tried parsing them as an i64 then casting the result to an f64, I found no performance improvements.

@DonIsaac Did you try parsing to a u64 (not i64)? Or maybe i64 was a typo?

I did, but I saw no performance gains.

@overlookmotel
Copy link
Contributor Author

overlookmotel commented Jul 14, 2024

Did you try implementing it manually like parse_binary and parse_octal (rather than using s.parse::<u64>())?

If you're in parse_int_without_underscores and kind is Kind::Decimal, then we know all the digits are 0-9. This means parsing is infallible and no need to handle all the other oddities which s.parse::<u64>() does (e, +, -, illegal chars, zero length etc). I'd imagine a hand-written version without all these complications would be faster.

@overlookmotel
Copy link
Contributor Author

I gave it a go (#4257). Yes, hardly any effect. Surprising!

Boshen pushed a commit that referenced this issue Jul 15, 2024
Closes #3288.

Speed up parsing decimal literals (e.g. `123`), using same techniques as `parse_octal` etc.
Boshen pushed a commit that referenced this issue Jul 15, 2024
Closes #3288.

Speed up parsing decimal literals (e.g. `123`), using same techniques as `parse_octal` etc.
Boshen pushed a commit that referenced this issue Jul 15, 2024
Closes #3288.

Speed up parsing decimal literals (e.g. `123`), using same techniques as `parse_octal` etc.
Boshen pushed a commit that referenced this issue Jul 15, 2024
Closes #3288.

Speed up parsing decimal literals (e.g. `123`), using same techniques as `parse_octal` etc.
Boshen pushed a commit that referenced this issue Jul 15, 2024
Closes #3288.

Speed up parsing decimal literals (e.g. `123`), using same techniques as `parse_octal` etc.
@overlookmotel
Copy link
Contributor Author

overlookmotel commented Jul 15, 2024

@DonIsaac I'm frustrated how little effect this change had, though it in retrospect it makes sense - multiplying by 10 is just inherently a (relatively) slow operation. Why oh why don't humans have 8 fingers and then decimal would never have existed? 😃

Do you think there'd be any gain in adding a token kind for single digit number, with a fast path? 0 is likely the most common number used in JS code e.g. for (let i = 0; i < arr.length; i++) and other single-digit numbers are probably common too. Or do you think that's too much of a micro-opt to be unlikely to produce any real gain?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-parser Area - Parser C-performance Category - Solution not expected to change functional behavior, only performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants