Skip to content

Techniques for writing interpreter loops #3

Open
@yorickpeterse

Description

@yorickpeterse

For interpreters one of the core components is the interpreter/dispatcher loop. In it's most basic way this is essentially a loop combined with a (giant) match, but depending on the language there are other techniques available. I think it's worth collecting the various approaches available to Rust, their benefits, drawbacks, etc. Over time we should probably extend this into a document outlining this more clearly, including examples and what not.

@pliniker wrote a bit about this in https://pliniker.github.io/post/dispatchers/, and I discussed things a bit in https://www.reddit.com/r/rust/comments/66h3t2/alternatives_to_dispatching_using_loop_match/.

Available Techniques

  • loop + match
    • 👍 Easy to implement
    • 👍 Portable (since it's just regular Rust code)
    • 👍 Fairly easy to understand, assuming the match isn't 5 000 lines long
    • 👎 Performance may vary depending on the CPU branch predictor, how Rust/LLVM compiles the code, etc
    • 👎 In my personal experience the performance can also be influenced by the number of arms, though this has never been consistent. Sometimes adding an arm would slow things down, then it would speed up again once I added one or two more.
  • loop plus some form of inline assembly
    • 👍 Probably as fast as you can get things
    • 👎 Not portable as you need to write different ASM snippets for different platforms
    • 👎 Very fragile due to how LLVM operates with ASM
    • 👎 Harder to maintain as you now need to know both Rust and ASM
  • computed goto
    • 👍 About as fast as the ASM approach
    • 👍 Assuming the language supports it you can write the code once for all platforms
    • 👎 Rust doesn't support this since it doesn't play well with the borrow checker, and probably won't for a very long time (if ever)
    • 👎 Can be a bit hard to wrap your head around
  • Recursive function calls using TCO
    • 👍 Good performance, though I don't remember how well it performs compared to the other techniques
    • 👍 Doesn't require messing with ASM or goto
    • 👎 Only enabled when building in release mode
    • 👎 Only available on x86_64 if I remember correctly
    • 👎 It requires that every function is a separate function, which can make it very hard to control the outer loop (e.g. a break in a function won't work). This means that to control the loop you'd still need some kind of match

There are probably more, but these are the ones I can think of at the moment.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions