Fast path for parsing decimal literals #3288

overlookmotel · 2024-05-15T08:11:46Z

#3283 made an optimization to the parser to avoid checking for _s in numeric literals twice, and got a nice 1% speed-up on parser benchmarks.

It occurs to me that we could repeat the trick for the common case of simple decimal literals (e.g. 0, 1, 123). str::parse::<f64>() is quite a complicated function which checks for all kinds of things (exponent, decimal point, valid input etc). All of this is repeated work, since we've already determined in the lexer what the input is.

Perhaps we could have a flag which lexer sets if the input is a simple case, and then use a hand-rolled parse_simple_decimal function to convert to f64, like the ones we have for binary and octal?

oxc/crates/oxc_parser/src/lexer/number.rs

Lines 76 to 88 in f38d138

    
           fn parse_octal(s: &str) -> f64 { 
        
               debug_assert!(!s.is_empty()); 
        
               let mut result = 0_f64; 
        
               for c in s.as_bytes().iter().filter(|s| s != &&b'_') { 
        
                   #[allow(clippy::cast_lossless)] 
        
                   let value = (c - b'0') as f64; 
        
                   result = result.mul_add(8.0, value); 
        
               } 
        
               result 
        
           }

(we should base implementation on std's str::parse::<f64> as it's probably very optimized already, but just cut out all the extraneous checks)

Given that #3283 got a 1% speed-up, maybe this can get the same again, or maybe more?

@DonIsaac Would you be interested in investigating?

The text was updated successfully, but these errors were encountered:

overlookmotel · 2024-05-15T08:30:59Z

As I mentioned in #3289, it's mysterious why #3283 got a 5% speed bump on lexer benchmarks.

Before we test out what I'm suggesting in this issue, we should investigate that further. If it turns out that converting the padding bytes in Token from u8 to bool is a speed-gain in itself, we should do that first, so then we can measure the impact of the change described here in isolation (as it will also involve using one of those padding bytes as a bool).

DonIsaac · 2024-05-15T18:12:28Z

Just did something similar for non-decimals in #3296. Decimals are up next

DonIsaac · 2024-05-16T15:01:49Z

@overlookmotel I discovered that Kind::Decimals are never floating point numbers. However, when I tried parsing them as an i64 then casting the result to an f64, I found no performance improvements. However, my local benchmarking is not always accurate, so I'll make a small PR so we can see bench results from CI

Boshen · 2024-07-14T04:48:48Z

Is this resolved?

overlookmotel · 2024-07-14T10:18:05Z

I don't believe so.

However, when I tried parsing them as an i64 then casting the result to an f64, I found no performance improvements.

@DonIsaac Did you try parsing to a u64 (not i64)? Or maybe i64 was a typo?

DonIsaac · 2024-07-14T14:27:40Z

I don't believe so.

However, when I tried parsing them as an i64 then casting the result to an f64, I found no performance improvements.

@DonIsaac Did you try parsing to a u64 (not i64)? Or maybe i64 was a typo?

I did, but I saw no performance gains.

overlookmotel · 2024-07-14T15:08:12Z

Did you try implementing it manually like parse_binary and parse_octal (rather than using s.parse::<u64>())?

If you're in parse_int_without_underscores and kind is Kind::Decimal, then we know all the digits are 0-9. This means parsing is infallible and no need to handle all the other oddities which s.parse::<u64>() does (e, +, -, illegal chars, zero length etc). I'd imagine a hand-written version without all these complications would be faster.

overlookmotel · 2024-07-14T16:38:44Z

I gave it a go (#4257). Yes, hardly any effect. Surprising!

Closes #3288. Speed up parsing decimal literals (e.g. `123`), using same techniques as `parse_octal` etc.

overlookmotel · 2024-07-15T11:18:02Z

@DonIsaac I'm frustrated how little effect this change had, though it in retrospect it makes sense - multiplying by 10 is just inherently a (relatively) slow operation. Why oh why don't humans have 8 fingers and then decimal would never have existed? 😃

Do you think there'd be any gain in adding a token kind for single digit number, with a fast path? 0 is likely the most common number used in JS code e.g. for (let i = 0; i < arr.length; i++) and other single-digit numbers are probably common too. Or do you think that's too much of a micro-opt to be unlikely to produce any real gain?

overlookmotel added the C-performance Category - Solution not expected to change functional behavior, only performance label May 15, 2024

overlookmotel assigned Boshen May 15, 2024

Boshen removed their assignment May 15, 2024

overlookmotel added the A-parser Area - Parser label May 15, 2024

DonIsaac self-assigned this May 15, 2024

Boshen assigned DonIsaac and unassigned DonIsaac Jun 26, 2024

overlookmotel mentioned this issue Jul 14, 2024

perf(parser): faster parsing decimal numbers #4257

Merged

DonIsaac closed this as completed Jul 15, 2024

Boshen pushed a commit that referenced this issue Jul 15, 2024

perf(parser): faster parsing decimal numbers (#4257)

e8d9961

Closes #3288. Speed up parsing decimal literals (e.g. `123`), using same techniques as `parse_octal` etc.

Boshen pushed a commit that referenced this issue Jul 15, 2024

perf(parser): faster parsing decimal numbers (#4257)

50be167

Closes #3288. Speed up parsing decimal literals (e.g. `123`), using same techniques as `parse_octal` etc.

Boshen pushed a commit that referenced this issue Jul 15, 2024

perf(parser): faster parsing decimal numbers (#4257)

f402ef3

Closes #3288. Speed up parsing decimal literals (e.g. `123`), using same techniques as `parse_octal` etc.

Boshen pushed a commit that referenced this issue Jul 15, 2024

perf(parser): faster parsing decimal numbers (#4257)

226ecd7

Closes #3288. Speed up parsing decimal literals (e.g. `123`), using same techniques as `parse_octal` etc.

Boshen pushed a commit that referenced this issue Jul 15, 2024

perf(parser): faster parsing decimal numbers (#4257)

a7b328c

Closes #3288. Speed up parsing decimal literals (e.g. `123`), using same techniques as `parse_octal` etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast path for parsing decimal literals #3288

Fast path for parsing decimal literals #3288

overlookmotel commented May 15, 2024 •

edited

Loading

overlookmotel commented May 15, 2024 •

edited

Loading

DonIsaac commented May 15, 2024

DonIsaac commented May 16, 2024

Boshen commented Jul 14, 2024

overlookmotel commented Jul 14, 2024

DonIsaac commented Jul 14, 2024

overlookmotel commented Jul 14, 2024 •

edited

Loading

overlookmotel commented Jul 14, 2024

overlookmotel commented Jul 15, 2024 •

edited

Loading

Fast path for parsing decimal literals #3288

Fast path for parsing decimal literals #3288

Comments

overlookmotel commented May 15, 2024 • edited Loading

overlookmotel commented May 15, 2024 • edited Loading

DonIsaac commented May 15, 2024

DonIsaac commented May 16, 2024

Boshen commented Jul 14, 2024

overlookmotel commented Jul 14, 2024

DonIsaac commented Jul 14, 2024

overlookmotel commented Jul 14, 2024 • edited Loading

overlookmotel commented Jul 14, 2024

overlookmotel commented Jul 15, 2024 • edited Loading

overlookmotel commented May 15, 2024 •

edited

Loading

overlookmotel commented May 15, 2024 •

edited

Loading

overlookmotel commented Jul 14, 2024 •

edited

Loading

overlookmotel commented Jul 15, 2024 •

edited

Loading