Skip to content

Reduce branch misprediction in parser/lexer #192

@overlookmotel

Description

@overlookmotel

Boshen said in oxc-project/oxc#15513 (review):

Claude studied assembly code and discovered the jump table has a huge branch mis-prediction problem, but I couldn't figure out where the branch is.

My assumption is that the mispredicted branch is the jump table itself. The CPU can't predict which byte handler will get called.

This isn't so surprising, as the pattern of tokens is completely unpredictable.

I can see 3 possible way to try to improve this situation:

1. Spread out jump table calls

The whole process of getting a token is quite lengthy, starting with ParserImpl::bump_any (or one of the similar methods) and ending up in Lexer::next_token, Lexer::read_next_token, and Lexer::finish_next. All of this is too large to inline into all the call sites, so there's probably only one instance of Lexer::handle_byte (which calls into the jump table) in the binary. So the branch predictor has no context at all - every token in any position is going through the same jump instruction.

If we could slim down all the functions on this path enough so that they can be inlined into all the call sites, then there'd be many jump table instructions in the binary, and branch predictor will treat each one separately.

It may then be able to make much better predictions. For instance, when parsing an object literal { foo: 123 }, and you've just received the token for the identifier foo, it's very predictable indeed that the next token is going to be :.

However, slimming down the "get next token" process is going to be pretty tricky - a lot of logic has built up in there.

2. State machine

Here's an article I came across a while ago about writing a lexer as a state machine: https://nothings.org/computer/lexing.html

This is a completely different approach from the jump table we have now.

Basic idea is that you have a set of states and a series of tables which translate current_state + next_byte -> next_state.

e.g. When lexing !x ..., it goes:

  • Start: State::None
  • Byte ! -> State::ExclamationPartial (! but may not be complete)
  • Byte x -> State::Exclamation (! complete token)

Whereas when lexing !== ...:

  • Start: State::None
  • Byte ! -> State::ExclamationPartial (! but may not be complete)
  • Byte = -> State::ExclamationEqualPartial (!= but may not be complete)
  • Byte = -> State::ExclamationEqualEqual (!== complete token)

Lexing most tokens becomes a simple loop:

pub fn next_token(iter: &mut Iter<u8>) -> Kind {
    let mut state = State::Eof;
    loop {
        // A `Category` is a group of bytes e.g. `Category::Ident` is A-Z, a-z, _, $.
        // Convert byte to `Category` to reduce size of the `NEXT_STATE_MAP` tables.
        let category: Category = if let Some(&byte) = iter.next() {
            BYTE_TO_CATEGORY[byte as usize]
        } else {
            Category::Eof
        };

        state = NEXT_STATE_MAP[state as usize][category as usize];

        if state.is_final() { // `state as u8 <= FINAL_STATE_MAX`
            break;
        }
    }

    if state.is_special() { // `state as u8 >= SPECIAL_STATE_MIN`
        // TODO: Handle special `State`s which need further processing e.g. `Ident`
    } else {
        // If we align discriminants of `Kind` and `State`, then we can skip this mapping
        STATE_TO_KIND[state as usize]
    }
}

This does still incur branch mispredictions - state.is_final() is unpredictable, and state.is_special() is somewhat unpredictable too. But quite possibly that's less mispredictions than we have at present, once mispredictions in byte handlers are taken into account.

I tried implementing this earlier this year, but stalled as creating all the state tables was very laborious. As far as I got:

https://github.com/oxc-project/oxc/blob/b8b61fa65272960b44a256295dc34379cdd6c819/crates/oxc_parser/src/lexer/state_machine.rs

It'd probably be easier to codegen it.

I'm not actually sure this would be an improvement, but it's probably worth a go.

3. Hard-code branches at call sites

In places where the next token is very predictable (e.g. : after an object key), hard code "if next byte is : then...", and skip calling into lexer at all.

In the case of : in { foo: 123 }, most of the work the lexer does is wasted. We never use the span of that token, for example.

This would likely be effective, but could be very laborious and error-prone.

Other ideas

Combination of 1 + 2

If the state machine was inlined into call sites, then its branches would become much more predictable.

Remove bounds checks

#20

This would help with either of the above (in the state machine, it'd remove a branch from the loop, though that branch is predictable).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions