Language Design and Theory of Computation

Undecidable Parsing

Deciding whether to print "syntax error" shouldn't require simulating a Turing machine.

Parsing Bash is Undecidable
- Bash, Perl, and Make can't be statically parsed because they interleave parsing and execution.
- C++ can't be statically parsed because it has two Turing complete computation mechanisms -- templates at compile time and normal code at runtime -- and parsing sits in between them.

Accidentally Turing Complete

Shell wasn't Turing complete when first designed.
Make wasn't Turing complete when first designed. See Example Code in Shell, Awk, and Make.
- makelisp
sedlisp is proof that sed is Turing complete. Not sure what constructs are used.
Accidentally Turing Complete -- nice comprehensive list, including some games which we don't care about here
- SQL
  - However, with the features Common Table Expressions and Windowing, SQL is Turing Complete.
  - https://wiki.postgresql.org/wiki/Cyclic_Tag_System
- Apache rewrite rules
- sendmail
- MediaWiki templates
What about TeX and PostScript? I suspect those were both intended to be Turing Complete, although the syntax is odd.
CSS3 proven to be Turing Complete --
- https://github.com/elitheeli/stupid-machines
- http://eli.fox-epste.in/rule110/
printf is Turing complete. %n specifier writes to memory?
- printbf -- Brainfuck interpreter in printf
- Section 6.2.1 of http://nebelwelt.net/publications/files/15SEC.pdf
When we control the arguments to printf(), it is possible to obtain Turing-complete computation. We show this formally in Appendix B by giving calls to printf() which create logic gates.
x86 MMU Fault Handling
- trapcc -- Compute with 0 instructions on Intel!
This is a proof by construction that the Intel MMU's fault handling mechanism is Turing complete. We have constructed an assembler that translates 'Move, Branch if Zero, Decrement' instructions to C source that sets up various processor control tables. After this code has executed, the CPU computes by attempting to fault without ever executing a single instruction.

Context-Free Grammars are not Powerful Enough in Practice

Parsing is Difficult

TODO: Move to Parsing Is Difficult page?

Humans are better than computers at parsing programming languages.

To-do: collect instances of people arguing for grammars . For tools, IDEs?

Nice series by Trevor Jim
- Python is not context-free? by Trevor Jim
  - https://news.ycombinator.com/item?id=5184022
  - My Reddit Comments
- Is Java context-free? by Trevor Jim.
  - two grammars is a sign that context-free
- about 4 or 5 other posts
[The Context-Sensitivity of C's Grammar](http://eli.thegreenplace.net/2007/11/24/the-context-sensitivity-of-cs grammar)
- Revisited
Perl/C++/Make : not really on the table because they're not statically parseable.
Ruby: experience with writing a full Ruby parser in Ruby https://news.ycombinator.com/item?id=13041646

Yacc isn't Powerful Enough in Practice

Zinc Abstract Machine paper for CamlLight: the yacc parser had ~50 reduce/reduce conflicts and ~500 shift/reduce conflicts, or it had to be written in a convoluted style. Problem: Caml doesn't have closing delimiters.
CoffeeScript video: "Rise of Transpilers": "cheating" by adding fake tokens in the input
JRuby: cheating by massaging output
How Clang handles the type/variable name ambiguity of C++ -- My gripe is not with the grammar itself (although I admit it's needlessly complex), it's with the inability of Yacc-generated LALR(1) parsers to parse it without considerable hacks. As I've mentioned numerous times before, industrial-strength compilers for C/C++ exist after all, so they do manage to somehow parse these languages.

Alternative idea: parsers should be architected as pure functions from tokens to AST... (Doesn't matter what computational model) And then you can instrument them and empirically determine backtracking/efficiency ?

TODO Review/Summarize papers
- Yakker -- scannerless, code available in OCaml
- Niall -- for data formats, length prefixes and checksums, etc.
- Pure Declarative Syntax Lost and Regained (GLR). From Nix authors.
- Langsec: this is a different but related issue. CFG is the wrong model. Models should be based on reality.
  - finite vs infinite languages. Length prefixes.

Sweet spot? Context-sensitive / Turing complete, linear time lexing, and statically parseable

Lexing and Parsing should be Separate.

PEGs combine them, only for reasons of presentation.

Had this discussion twice: at work about GLR, and maybe about Gazelle. And on HN: https://news.ycombinator.com/item?id=13042835

Wrote up a wiki page that was popular: Why Lexing and Parsing Should Be Separate

Related Issues

Turing completeness vs. side effects/capabilities. Both of these are design issues and shouldn't be conflated.
Whether there is abstraction power like functions. I think Awk was Turing complete at first, but it didn't have functions. Related lobsters thread.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly