Reduce branch misprediction in parser/lexer

Boshen said in https://github.com/oxc-project/oxc/pull/15513#pullrequestreview-3440128679:

> Claude studied assembly code and discovered the jump table has a huge branch mis-prediction problem, but I couldn't figure out where the branch is.

My assumption is that the mispredicted branch *is* the jump table itself. The CPU can't predict which byte handler will get called.

This isn't so surprising, as the pattern of tokens is completely unpredictable.

I can see 3 possible way to try to improve this situation:

## 1. Spread out jump table calls

The whole process of getting a token is quite lengthy, starting with `ParserImpl::bump_any` (or one of the similar methods) and ending up in `Lexer::next_token`, `Lexer::read_next_token`, and `Lexer::finish_next`. All of this is too large to inline into all the call sites, so there's probably only one instance of `Lexer::handle_byte` (which calls into the jump table) in the binary. So the branch predictor has no context at all - every token in any position is going through the same jump instruction.

If we could slim down all the functions on this path enough so that they can be inlined into all the call sites, then there'd be many jump table instructions in the binary, and branch predictor will treat each one separately.

It may then be able to make much better predictions. For instance, when parsing an object literal `{ foo: 123 }`, and you've just received the token for the identifier `foo`, it's very predictable indeed that the next token is going to be `:`.

However, slimming down the "get next token" process is going to be pretty tricky - a lot of logic has built up in there.

## 2. State machine

Here's an article I came across a while ago about writing a lexer as a state machine: https://nothings.org/computer/lexing.html

This is a completely different approach from the jump table we have now.

Basic idea is that you have a set of states and a series of tables which translate `current_state + next_byte -> next_state`.

e.g. When lexing `!x ...`, it goes:

* Start: `State::None`
* Byte `!` -> `State::ExclamationPartial` (`!` but may not be complete)
* Byte `x` -> `State::Exclamation` (`!` complete token)

Whereas when lexing `!== ...`:

* Start: `State::None`
* Byte `!` -> `State::ExclamationPartial` (`!` but may not be complete)
* Byte `=` -> `State::ExclamationEqualPartial` (`!=` but may not be complete)
* Byte `=` -> `State::ExclamationEqualEqual` (`!==` complete token)

Lexing most tokens becomes a simple loop:

```rs
pub fn next_token(iter: &mut Iter<u8>) -> Kind {
    let mut state = State::Eof;
    loop {
        // A `Category` is a group of bytes e.g. `Category::Ident` is A-Z, a-z, _, $.
        // Convert byte to `Category` to reduce size of the `NEXT_STATE_MAP` tables.
        let category: Category = if let Some(&byte) = iter.next() {
            BYTE_TO_CATEGORY[byte as usize]
        } else {
            Category::Eof
        };

        state = NEXT_STATE_MAP[state as usize][category as usize];

        if state.is_final() { // `state as u8 <= FINAL_STATE_MAX`
            break;
        }
    }

    if state.is_special() { // `state as u8 >= SPECIAL_STATE_MIN`
        // TODO: Handle special `State`s which need further processing e.g. `Ident`
    } else {
        // If we align discriminants of `Kind` and `State`, then we can skip this mapping
        STATE_TO_KIND[state as usize]
    }
}
```

This does still incur branch mispredictions - `state.is_final()` is unpredictable, and `state.is_special()` is somewhat unpredictable too. But quite possibly that's less mispredictions than we have at present, once mispredictions in byte handlers are taken into account.

I tried implementing this earlier this year, but stalled as creating all the state tables was very laborious. As far as I got:

https://github.com/oxc-project/oxc/blob/b8b61fa65272960b44a256295dc34379cdd6c819/crates/oxc_parser/src/lexer/state_machine.rs

It'd probably be easier to codegen it.

I'm not actually sure this would be an improvement, but it's probably worth a go.

## 3. Hard-code branches at call sites

In places where the next token is very predictable (e.g. `:` after an object key), hard code "if next byte is `:` then...", and skip calling into lexer at all.

In the case of `:` in `{ foo: 123 }`, most of the work the lexer does is wasted. We never use the span of that token, for example.

This would likely be effective, but could be very laborious and error-prone.

### Other ideas

#### Combination of 1 + 2

If the state machine was inlined into call sites, then its branches would become much more predictable.

#### Remove bounds checks

https://github.com/oxc-project/backlog/issues/20

This would help with either of the above (in the state machine, it'd remove a branch from the loop, though that branch *is* predictable).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce branch misprediction in parser/lexer #192

1. Spread out jump table calls

2. State machine

3. Hard-code branches at call sites

Other ideas

Combination of 1 + 2

Remove bounds checks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reduce branch misprediction in parser/lexer #192

Description

1. Spread out jump table calls

2. State machine

3. Hard-code branches at call sites

Other ideas

Combination of 1 + 2

Remove bounds checks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions