Skip to content

Compact AST Representation

andychu edited this page Jan 25, 2020 · 54 revisions

Compact AST Representation

  • Oil Blog Post: What is oheap?
    • Oheap V1 was a read-only format for transporting an ASDL tree across the Python/C++ boundary. Though it turned out there wasn't a single clean boundary in the shell's architecture.
  • OHeap2 was a read-write format, for OVM2.
    • At first I used it to represent .pyc files (Python's CodeObject)
    • Meant to replace marshal, pickle, and zipimport.
    • 4-16-N model: 4 byte refs, 16 byte cells, and N byte slabs.
    • See ovm2/oheap2.py
    • OHeap2 might still be a good idea, but I judged OVM2 not to be a good idea
  • Reddit Thread: Representing ASTs as byte strings with with small integers rather than pointers
  • Ken Thompson's postfix encoding for regexes (no pointers) is mentioned here: https://swtch.com/~rsc/regexp/regexp1.html
  • sajson: Last week, I described a JSON parse tree data structure that, worst case, requires N words for N characters of JSON text. Single allocation JSON?
  • paper on on totally recompiling the code to make it a linear pass.

Also:

  • Compiling tree transforms to operate on packed representations at ECOOPLDI 2017
    • Conference Presentation on YouTube
    • This is much more ambitious, but also more limited. Not only are there no pointers, but the tree traversal is equivalent to a linear scan through memory. It compiles a custom programming language and rearranges the operations so this is true.
    • See limitations at the end of the paper.
  • Good references in that papers:
    • The Lattner/Adve paper below
    • Cache-Conscious Data Structure Layout
    • Compact Normal Form in Glasgow Haskell Compiler -- reduces both space and garbage collection pressure.

Compressed Pointers

This is related: more general, but more complex.

Reducing Garbage Collection Pressure

Clone this wiki locally