-
-
Notifications
You must be signed in to change notification settings - Fork 158
ParsingCaseStudies
andychu edited this page Apr 23, 2017
·
16 revisions
What algorithms and tools do production-quality languages use for parsing?
- lexing
- parsing, including operator precedence parsing
- AST representation
TODO: Add links
- bash: multiple parsers.
- dash: hand-written lexer, recursive descent parser, generated AST nodes
- mksh: stateful lexer
- zsh:
PowerShell?
- Ninja: re2c for lexer (for speed, used to be hand-coded), recursive descent for parser.
- Bazel: hand-coded lexer and recursive descent parser
- sqlite: uses bespoke Lemon parser generator (bottom up).
- protobuf compiler: hand-written lexer and recursive descent parser. (What about other schema languages?)
- Python
- hand-coded lexer with indentation stack
- parser generated with bespoke pgen.c
- generated AST, with Zephyr ASDL
- Ruby
- big yacc grammar
- JRuby: Jay, yacc for Java
- Perl
- PHP
- Lua: hand-coded lexer and recursive descent parser in C. There is no AST because it generates code while parsing!
- Wren: hand-coded lexer, recursive descent with Pratt parsing.
- Dart
- CoffeeScript: hand-coded lexer with regexes, token "fixups", JISON bottom up parser
- Java
- C#
- Clang -- hand-written parser, enormous hand-written AST with C++ classes
- Go: hand-written C that was automatically converted to Go
- Rust
- Swift
- OCaml
- Haskell
- Awk: sort of a poster child for yacc.
- Scientific languages
- Julia: lexer and parser are hand-written in femtolisp! Enables Julia macros.
- R:
src/main/gram.y
is 3500 lines - Mathematica?
- JavaScript
- v8
- duktape
- mujs
- narcissus (JS )
Tiny Languages
- TCC
- tinypy -- Pratt parsing in Python.
As counter point:
- R uses yacc
- Ruby uses yacc. JRuby uses it too.
(bash uses yacc, but it's not a success story.)