perf(lexer): skip single space in `read_next_token` #15513

overlookmotel · 2025-11-09T11:27:16Z

It's very common for tokens to be separated by a single space. e.g. const x = 1, x === y.

Previously a single space resulted in calling the SPS byte handler, which consumes the space, and then going round the loop again in Lexer::read_next_token.

Instead, branchlessly consume a single space (if there is one) before calling the byte handler.

Gives between 2% and 7% perf improvement on parser benchmarks.

This also enables a further optimization (not yet implemented).

Now the handler for whitespace (SPS) no longer has a hot path for single spaces - it's now only called for a tab, or a 2nd space in a row. In both those cases, it's quite likely there'll be more whitespace following it, so it can now be optimized for that case, and continue consuming bytes until it finds one that isn't whitespace.

If handlers for whitespace, line breaks, and comments all continue consuming bytes until they find a "real" token, then we can get rid of Kind::Skip, and remove the loop from read_next_token. This would remove another unpredictable branch.

overlookmotel · 2025-11-09T11:27:32Z

How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

0-merge - adds this PR to the back of the merge queue
hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

codspeed-hq · 2025-11-09T11:36:01Z

CodSpeed Performance Report

Merging #15513 will improve performances by 6.57%

_{Comparing 11-09-perf_lexer_skip_single_space_in_read_next_token_ (6dba827) with 11-09-perf_lexer_inline_handle_byte_into_read_next_token_ (ff4461f)}

Summary

⚡ 2 improvements
✅ 35 untouched

Benchmarks breakdown

	Mode	Benchmark	`BASE`	`HEAD`	Change
⚡	Simulation	`parser[RadixUIAdoptionSection.jsx]`	87 µs	81.6 µs	+6.57%
⚡	Simulation	`parser[react.development.js]`	1.3 ms	1.3 ms	+3.83%

overlookmotel · 2025-11-09T16:23:49Z

First version of this PR showed very weird benchmark results - massive regression on the lexer benchmarks, but improvement on all parser benchmarks (between 2% - 7%).

Turns out that this was an oddity of the benchmark code. The increase in size of next_token lead to it no longer getting inlined into lexer benchmark, which was a massive perf hit.

#15519 and #15520 fix that, so now we can see the effects of this change in isolation. And they're quite good!

Boshen

Claude study assembly code and discovered the jump table has a huge branch mis-prediction problem, but I couldn't figure out where the branch is.

Question: can we use simd to find the next position that is not a whitespace?

Thought: We should study v8 at some point.

Boshen · 2025-11-09T16:45:31Z

Merge activity

Nov 9, 4:45 PM UTC: The merge label '0-merge' was detected. This PR will be added to the Graphite merge queue once it meets the requirements.
Nov 9, 4:45 PM UTC: Boshen added this pull request to the Graphite merge queue.
Nov 9, 4:55 PM UTC: Merged by the Graphite merge queue.

It's very common for tokens to be separated by a single space. e.g. `const x = 1`, `x === y`. Previously a single space resulted in calling the `SPS` byte handler, which consumes the space, and then going round the loop again in `Lexer::read_next_token`. Instead, branchlessly consume a single space (if there is one) before calling the byte handler. Gives between 2% and 7% perf improvement on parser benchmarks. --- This also enables a further optimization (not yet implemented). Now the handler for whitespace (`SPS`) no longer has a hot path for single spaces - it's now only called for a tab, or a 2nd space in a row. In both those cases, it's quite likely there'll be more whitespace following it, so it can now be optimized for that case, and continue consuming bytes until it finds one that *isn't* whitespace. If handlers for whitespace, line breaks, and comments all continue consuming bytes until they find a "real" token, then we can get rid of `Kind::Skip`, and remove the loop from `read_next_token`. This would remove another unpredictable branch.

…mark (#15519) Preparatory step for #15513. That PR was showing a massive slowdown on lexer benchmarks, but it was only due to the change in that PR resulting in `next_token` not being inlined into the lexer benchmark. Add a separate function `next_token_for_benchmarks` which has identical context as `next_token`, but is marked `#[inline(always)]`, and use it in lexer benchmark instead. This fixes the problem with the benchmark in #15513.

Preparatory step for #15513. That PR adds a 2nd callsite for `handle_byte`. Mark it as `#[inline(always)]` to make sure it gets inlined in both places. This was originally part of #15513, but have split it out into a separate PR so that Codspeed's results on #15513 measure the actual substantive change in isolation - to check that change is having the effect I think it is, and that the gain wasn't actually coming from adding `#[inline(always)]` here. This PR has no effect on performance, so the gain *is* in #15513.

overlookmotel · 2025-11-09T20:36:13Z

Claude study assembly code and discovered the jump table has a huge branch mis-prediction problem, but I couldn't figure out where the branch is.

Interesting! Can you post what he said?

I don't think it's surprising though. The branch is the jump table. CPU cannot predict which way it'll jump.

I've opened an issue in backlog about this: oxc-project/backlog#192

Question: can we use simd to find the next position that is not a whitespace?

Yes, and Rust stabilized avx512 support recently. But I don't think the problem of poor branch prediction is specific to whitespace. It affects all tokens.

Thought: We should study v8 at some point.

Yes probably!

github-actions bot added the A-parser Area - Parser label Nov 9, 2025

github-actions bot added the C-performance Category - Solution not expected to change functional behavior, only performance label Nov 9, 2025

overlookmotel mentioned this pull request Nov 9, 2025

perf(lexer): hint to compiler that EOF only happens once #15512

Merged

overlookmotel marked this pull request as ready for review November 9, 2025 11:29

overlookmotel marked this pull request as draft November 9, 2025 11:29

overlookmotel self-assigned this Nov 9, 2025

overlookmotel force-pushed the 11-09-perf_lexer_skip_single_space_in_read_next_token_ branch from 020e6c0 to 1bb8b9f Compare November 9, 2025 12:04

graphite-app bot changed the base branch from 11-09-perf_lexer_hint_to_compiler_that_eof_only_happens_once to graphite-base/15513 November 9, 2025 12:40

graphite-app bot force-pushed the graphite-base/15513 branch from bef04c1 to 2f0518d Compare November 9, 2025 12:47

graphite-app bot force-pushed the 11-09-perf_lexer_skip_single_space_in_read_next_token_ branch from 1bb8b9f to e54a659 Compare November 9, 2025 12:47

graphite-app bot changed the base branch from graphite-base/15513 to main November 9, 2025 12:47

graphite-app bot force-pushed the 11-09-perf_lexer_skip_single_space_in_read_next_token_ branch from e54a659 to 10fbede Compare November 9, 2025 12:47

overlookmotel force-pushed the 11-09-perf_lexer_skip_single_space_in_read_next_token_ branch 2 times, most recently from 017c2da to 482bd16 Compare November 9, 2025 13:47

overlookmotel changed the base branch from main to graphite-base/15513 November 9, 2025 16:12

overlookmotel force-pushed the 11-09-perf_lexer_skip_single_space_in_read_next_token_ branch from 482bd16 to 6dba827 Compare November 9, 2025 16:12

overlookmotel changed the base branch from graphite-base/15513 to 11-09-perf_lexer_inline_handle_byte_into_read_next_token_ November 9, 2025 16:12

This was referenced Nov 9, 2025

ci(benchmarks/lexer): fix lexer benchmark #15517

Merged

ci(benchmarks/lexer): ensure next_token is inlined into lexer benchmark #15519

Merged

perf(lexer): inline handle_byte into read_next_token #15520

Merged

overlookmotel assigned Boshen Nov 9, 2025

overlookmotel requested a review from Boshen November 9, 2025 16:24

overlookmotel marked this pull request as ready for review November 9, 2025 16:24

Boshen reviewed Nov 9, 2025

View reviewed changes

Boshen added the 0-merge Merge with Graphite Merge Queue label Nov 9, 2025

graphite-app bot force-pushed the 11-09-perf_lexer_inline_handle_byte_into_read_next_token_ branch from ff4461f to b310c28 Compare November 9, 2025 16:46

graphite-app bot force-pushed the 11-09-perf_lexer_skip_single_space_in_read_next_token_ branch from 6dba827 to f1efc63 Compare November 9, 2025 16:47

graphite-app bot removed the 0-merge Merge with Graphite Merge Queue label Nov 9, 2025

Base automatically changed from 11-09-perf_lexer_inline_handle_byte_into_read_next_token_ to main November 9, 2025 16:53

graphite-app bot merged commit f1efc63 into main Nov 9, 2025
22 checks passed

graphite-app bot deleted the 11-09-perf_lexer_skip_single_space_in_read_next_token_ branch November 9, 2025 16:55

overlookmotel mentioned this pull request Nov 9, 2025

Reduce branch misprediction in parser/lexer oxc-project/backlog#192

Open

Boshen mentioned this pull request Nov 11, 2025

release(crates): oxc v0.97.0 #15582

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

perf(lexer): skip single space in `read_next_token` #15513

perf(lexer): skip single space in `read_next_token` #15513

overlookmotel commented Nov 9, 2025 •

edited

Loading

Uh oh!

overlookmotel commented Nov 9, 2025 •

edited

Loading

Uh oh!

codspeed-hq bot commented Nov 9, 2025 •

edited

Loading

Uh oh!

overlookmotel commented Nov 9, 2025

Uh oh!

Boshen left a comment

Uh oh!

Boshen commented Nov 9, 2025 •

edited by graphite-app bot

Loading

Uh oh!

Uh oh!

overlookmotel commented Nov 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

perf(lexer): skip single space in read_next_token #15513

perf(lexer): skip single space in read_next_token #15513

Conversation

overlookmotel commented Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

overlookmotel commented Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to use the Graphite Merge Queue

Uh oh!

codspeed-hq bot commented Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #15513 will improve performances by 6.57%

Summary

Benchmarks breakdown

Uh oh!

overlookmotel commented Nov 9, 2025

Uh oh!

Boshen left a comment

Choose a reason for hiding this comment

Uh oh!

Boshen commented Nov 9, 2025 • edited by graphite-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge activity

Uh oh!

Uh oh!

overlookmotel commented Nov 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

perf(lexer): skip single space in `read_next_token` #15513

perf(lexer): skip single space in `read_next_token` #15513

overlookmotel commented Nov 9, 2025 •

edited

Loading

overlookmotel commented Nov 9, 2025 •

edited

Loading

codspeed-hq bot commented Nov 9, 2025 •

edited

Loading

Boshen commented Nov 9, 2025 •

edited by graphite-app bot

Loading