combinators.tex

\documentclass[12pt]{article}

\title{Modern Parser Combinators in Python}
\date{\today}

\usepackage[sc,osf]{mathpazo}
\usepackage[T1]{fontenc}
\usepackage{microtype}
\usepackage{hyperref}
\usepackage{listings}
% \lstset{language=Python}
\lstset{escapechar=\!}

\usepackage[backend=bibtex8,style=authoryear]{biblatex}
\bibliography{combinators}


\begin{document}

\maketitle

\section{Overview}
\label{sec:overview}

The big idea behind this library is to combine techniques from the
parsing literature, some new, some old and neglected, to create a
library that can compete with hand-written recursive-descent parsers.
The key advantages that hand-written recursive-descent parsers have
over traditional LALR parser generators are simplicity, handling a
broader range of grammars, better error reporting, and easier
introduction of custom code.  A well-designed library can have most of
these, too, and some advantages over a recursive-descent parser, at
the cost of an implementation that isn't as simple.

\begin{itemize}
\item Parser combinators make parsers easier to learn and write by
  embedding a parsing library into a general-purpose programming
  language, making it easy to add custom code to the parser and not
  requiring that anyone learn a DSL like EBNF or formal grammars.
\item GLL parsing can parse any CFG, including ambiguous and
  left-recursive grammars, in $O(n^3)$ time, and unambiguous grammars
  in linear time, all while remaining top-down depth-first like
  recursive descent and thus easier to understand and debug.
\item Attribute grammars can allow the insertion of arbitrary code
  into the parsing process, principally allowing the parsing of
  context-sensitive grammars that a general CFG algorithm can't handle
  alone, and with the data-dependent grammar restrictions, parse them
  in $O(n^3)$ time.  They also provide an easy mechanism for adding
  semantic actions.
\item Non-correcting error handling can locate and provide reasonable
  messages for possible errors in an input string without
  providing any spurious messages.
\item Metaprogramming techniques like macros or staging can make
  parser combinators almost as efficient as recursive descent.
\item Attribute grammars can be inverted to allow writing both a
  parser and an unparser at the same time.
\end{itemize}


\section{Design Decisions}
\label{sec:design_decisions}


\subsection{Grammars}
\label{sec:grammar}

My goal for these parser combinators is that they should be able to
handle common languages and formats \emph{without} having to write
custom code except for extremely weird cases.  LL and LR grammars are
too limited: many natural grammars for common problems are not in
LL($k$) or LR($k$) for any $k$, and LR parsers, which can handle more
grammars for any given value of $k$, are difficult to debug.
Meanwhile, there now several algorithms capable of parsing any CFG
that can also parse all LL and LR grammars in linear time, giving the
same asymptotic performance as LL and LR parsers while handling a much
broader range of grammars.  CFGs also have better closure properties
than LL or LR grammars.  They're closed under language composition,
that is, they're closed under union, concatenation, and Kleene star.
Parsing problems with one language embedded in another are becoming
increasingly common, and the ability to write parsers for two
different languages and then combine them is useful.  Also, the suffix
language of a CFL is another CFL \parencite[p. 401]{grune_jacobs} and
so is its reverse language, which are useful for producing better
error messages (\ref{sec:errors}).

Unfortunately, CFGs are also not powerful enough to handle common
parsing problems.  Real-world programming languages and data formats
are context sensitive: C has the \texttt{typedef-name: identifier}
problem, Python has context-sensitive indentation, real-world HTML has
context-sensitive features like where \texttt{<input>} tags can
appear, and \TeX's DVI files have references to byte offsets in the
file.  Thus, a good parsing library needs to be able to handle context
sensitivity.  Note that one example often presented as context
sensitive, a field prefixed with a byte giving the field length like a
Pascal string, is actually regular: because a byte can only hold a
finite set of values, it's possible to write a regular expression for
the whole construct like,

\begin{lstlisting}
0 | 1. | 2.. | !\ldots! | 255.{255}
\end{lstlisting}

where \texttt{.} represents any character and a number in curly braces
represents repetition as usual.  

This technique generalizes to any field prefixed with a finite-size
integer and beyond.  In any language with finite symbols, any
length-type-value array (also type-length-value;
\url{https://en.wikipedia.org/wiki/Type-length-value}) can produce
only a finite language because the set of possible lengths and
possible types are limited.  When the production rule for a
length-type-value array is inserted into a grammar producing an
infinite language, the length-type-value array effectively acts as a
terminal because it can only create a finite set of productions, thus
it can't introduce context-sensitivity unless it's explicitly
introduced elsewhere.  If all the integers are single symbols for the
lengths, \texttt{a}, \texttt{b}, \texttt{c}, \ldots are types, and
\texttt{A}, \texttt{B}, \texttt{C}, \ldots are the production rules
corresponding to each type, these can be expressed using the
production rule:

\begin{lstlisting}
S !$\to$! 0a | 0b | 0c | !\ldots! | 1aA | 1bB | 1cC | !\ldots! |
2aAA | 2bBB | 2cCC | !\ldots!
\end{lstlisting}

This kind of use of large but finite productions can handle even some
weird cases like lengths of code that are then treated like their own
languages.  For instance, if nonterminal \texttt{A} has length 1
always and \texttt{B} length 2,

\begin{lstlisting}
S !$\to$! 0 | 1A | 2AA | 2B | 3AAA | 3AB | 3BA | !\ldots!
\end{lstlisting}

However, \emph{efficiently} handling these finite production rules is
another can of worms, because they can be exponential in the number of
symbols in the language and thus any DFA for them would be
impractical.

I chose as my context-sensitive grammar the data-dependent grammars of
\textcite{yakker1}, which are an extension of CFGs related to
L-attribute grammars.  DDGs have several attractive properties.
First, they're easier to understand and write than the primary
alternatives, PEGs and Boolean grammars, because they use familiar
concepts like variable binding and string recognition based on
semantic values, for instance treating bytes as integers rather than
abstract symbols.  While it's often easy enough to write a PEG or a
Boolean grammar for simple languages like $a^nb^nc^n$, something like
a field prefixed with a length in decimal as ASCII digits requires
multiple rules like the above for length-type-value arrays.
Meanwhile, a DDG simply calls a Turing-computable function to
transform the ASCII into an integer.  This touches on DDGs' second
major advantage, which is the ability to include arbitrary
Turing-computable code in a disciplined fashion.

\begin{quote}
  This lack of enthousiasm in incorporating attribute grammars into
  formal language and automata theory may be due partly to the
  complexity of the model and partly to the (obvious) fact that
  attribute grammars of a very simple type, having only one
  synthesized attribute and no inherited attributes, can already
  realize all possible translations [18] and can e.g. easily simulate
  type 0 Chomsky grammars (cf. [19]). \parencite{1V_AGs}
\end{quote}

The power of DDGs makes analyzing them harder but means that they can
parse anything and can incorporate existing specialized parser code
when necessary.  Third, they're based on CFG parsing algorithms, which
makes them easier to adapt for one of the existing efficient CFG
algorithms, with the original authors implementing them in particular
for combinators, for Earley, and for GLR \parencite{yakker2}.  Fourth,
their languages can be parsed in $O(n^3)$ time in the worst case, or
only $O(n)$ for deterministic languages and $O(n^2)$ for
locally-ambiguous languages.  This is a consequence of the single-pass
nature of DDGs: similar to L-attribute grammars, information flows up
and from left to right in the parse tree with no backtracking allowed.
Fifth, the parsing algorithm for DDGs is proved correct assuming the
underlying CFG parsing algorithm is correct.  Sixth, DDGs inherit from
CFGs important closure properties, also being closed under union,
concatenation, and Kleene star. A similar construction to the one for
CFGs shows that the suffix language of a DDG can be described by
another DDG, with one wrinkle: the parser always has to take all
options, treating them as alternatives, when encountering a constraint
based on a variable that has yet to be defined.

TODO: work out the reverse language for a DDG and figure out
something to say about it here.  Also work out a formal notion of
undefined for attributes: if a constraint tries to access an
undefined attribute, it falls back to parsing all alteratives.  If a
attribute function receives an undefined attribute as an input, it
always returns undefined.

The two major alternatives I considered were PEGs and Boolean
grammars.  They both have the disadvantage that to parse arbitrary
Turing-computable properties requires further extending them.  The
major advantage of PEGs is that the corresponding parsing algorithm,
the packrat parser, always runs in $O(n)$ time.  However, modern CFG
algorithms can parse most CFGs in linear time, and the ones that they
can't are highly ambiguous and only found in a subset of real parsing
problems like parsing DNA.  My experiments indicate that it's possible
to parse many CFGs in $O(n)$ time without any memoization, and
experiments in \textcite{packrat_is_it_worth_it} back this up, showing
that no memoization is often close to optimal.  With the high
preexisting memory cost of the parse tree, the memory costs of the
packrat parser make running out of memory even more likely.  Aside
from the memory issues, PEGs have three other disadvantages.  Making a
PEG parser handle left recursion makes the implementation much more
complicated, particularly when dealing with indirect left recursion,
and costs the linear-time guarantee.  PEGs can't be composed in the
same way that CFGs can, since PEGs ``are closed under union, but only
under a non-standard definition of what it means for a string to be in
a language (the PEG need only succeed on a prefix of the
string)'' \parencite[p. 4]{yakker1}.  This and other problems are due
to the non-commutativity of alternation in PEGs.  Similarly, the way
PEGs avoid ambiguity makes it hard to explicitly deal with it where it
naturally arises, for instance in unparsing the AST for a language
that ignores white space or when trying to do error handling using the
suffix language.

Boolean grammars are a promising approach because they share
properties like closure under union, concatenation, Kleene star,
suffix, and reversal.  However, there hasn't been enough work yet done
on developing practical parsers for them.  Many of the algorithms
known to work for CFGs have not been extended to Boolean grammars.  In
particular, neither of the algorithms most friendly for parser
combinators, GLL nor Earley, have been, if they even can be extended.
There's also nothing on other aspects of building a practical parser
like error handling.


\subsection{Algorithms}
\label{sec:algorithms}

TODO: Reach a new decision here.

On closer analysis, there are two major questions about algorithm
performance.  First, while the worst-case performance for all general
CFG algorithms is $O(n^3)$, the set of grammars which they can handle
in linear time differs.  One notable advantage of Earley's algorithm
is that with Leo's modifications it can parse all LR-regular languages
in linear time \parencites{leo, marpa}, and the set of LR-regular
languages contains the deterministic languages \parencite{lr-regular}.
Leo accomplishes this by memoizing certain cases of right recursion
that would otherwise take $O(n^2)$.  It's not clear to me what
grammars GLL can handle in linear time.  Scott and Johnstone claim
GLL, ``runs in linear time on LL grammars and which also allows
grammar rule factorisation, with consequential speed
up'' \parencite{gll1}, which could mean that GLL runs in linear time
on all LL($k$) for all $k$, but that set is a proper subset of the
deterministic grammars.  However, Spiewak says, ``Thus, the class of
grammars which can be handled by GLL in O(n) time is precisely
equivalent to LL(1)'' \parencite{spiewak}, which is a much smaller set
than all LL($k$) grammars.  Both of these sets are much smaller than
the LR-regular grammars.  ANTLR3 and ANTLR4 run in linear time on a
set the authors call LL(*) \parencites{antlr3, antlr4}, which I think
is equivalent to LL-regular, more or less by directly using DFAs to
predict which alternative to pick.  It's possible the same approach
might work for GLL to allow it to run in linear time on LL-regular
grammars, though LL-regular is a proper subset of the LR-regular
grammars\parencite{ll-regular}.

The second question involves constant factors.  There are conflicting
reports in the literature about how different parsing algorithms
compare to each other.  The most extensive comparison of modern
parsing algorithms is in \textcite{antlr4}, where the authors
benchmark their ALL(*) algorithm against GLR, GLL, packrat/PEG, and
hand-written recursive-descent implementations as well as some others.
On their test case, Java code, all of the limited parsers were much
faster than the general parsers, by orders of magnitude in some cases,
with Rascal's GLL parser performing particularly poorly.  They don't
directly compare their algorithm against an Earley parser but state
that \textcite{tomita1985efficient} ``shows GLR to be 5x-10x faster
than Earley.''  It's unclear where this penalty comes from, and it
should be noted the comparison may no longer be valid because Tomita
compared the algorithms before Leo published his modifications that
make Earley run in linear time on LR-regular grammars.  This still
seems to be the general impression about Earley versus GLR, but I
haven't seen any evidence backing it up outside Tomita's work.  They
suggest that based on disabling the caching of lookahead DFAs, ``This
performance is in line with the high cost of GLL and GLR parsers that
also do not reduce parser speculation by memoizing parsing
decisions,'' which seems to suggest part of the problem the general
were experiencing was O($n^2$) performance because of local
ambiguities, not constant factors, while in their testing, ALL(*) ran
in linear time on all of the grammars they tried.  [TODO: Is this a
consequence of the general parsers only handling a too-limited set of
grammars in linear time?  I don't know what grammars GLR can parse in
linear time or if any of these Java grammars fall into that set.]
However, part of the problem could easily be constant factors, with
even a linear GSS being slower than a standard stack, for instance.
They note, ``Of the GLR tools, Elkhound has the best performance
primarily because it relies on a linear LR(1) stack instead of a GSS
whenever possible.  Further, we allowed Elkhound to disambiguate
during the parse like ALL(*),'' As ANTLR generates recursive-descent
parsers, this suggests that a parser combinator approach that uses
metaprogramming could be competitive.


TODO: rewrite this.

The backend parsing algorithm will be GLL modified to include the
DDG/L-AG features.  While Valiant's algorithm, which is based on the
CYK algorithm and Boolean matrix multiplication, has the best known
worst-case run time, CYK has worse average-case run time than other
CFG parsing algorithms, and linear-time average-case performance is
more important than sub-cubic-time worst-case performance.  The main
reason I favor GLL is that the alternative algorithms with linear-time
average-case performance, GLR and Earley, are both harder to
understand, while GLL is almost like a recursive-descent parser.
GLL's similarities to recursive descent make it much easier to
implement as parser combinators and easier to combine with DDGs/L-AGs
because of the top-down left-right information flow in both GLL and
DDGs/L-AGs.

Earley is the clear runner-up here, since there exist implementations
of it using parser combinators and efficient theoretical developments
of the algorithm that include all the key features I want like SPPF
creation.  The original DDG paper \textcite{yakker1} implements DDGs
on top of Earley's algorithm.

As far as I know, there are no parser combinators for bottom-up
parsers like GLR, and bottom-up depth-first algorithms are harder to
understand than both top-down depth-first ones like GLL and
breadth-first algorithms like Earley.  The original versions of both
GLR and Earley have theoretical problems with producing parse trees,
and for GLR, fixing those problems has created a plethora of
alternatives.  There being no canonical version of GLR is a good
secondary reason to avoid it.


\subsection{Error Recovery}
\label{sec:errors}

For error handling I'm using Richter's non-correcting error recovery.
When parsing binary data, one major problem with finding errors is
that the point where the parser fails is often not the point where the
input or parser went wrong.  Non-correcting error recovery brackets a
range where the error may have occurred, making it easier to find
errors.  It's also more suited to the kinds of errors in binary data
in general, which are usually not like typos or other small isolated
errors but large blocks of broken data, and to Python's interactive
debugging.  It uses the suffix grammar and the reverse grammar of the
original input grammar.  While deterministic grammars are not closed
under suffix (I don't know about reverse), CFGs are, and because I'm
using GLL, I can use the same algorithm for both normal parsing and
error-handling.


\subsection{Architecture}
\label{sec:architecture}

The reason to write my own library in Python rather than using
Libmarpa CFFI bindings is to support Python variants.  CFFI only works
to allow calls from the interpreter level to C functions, not RPython
to C.  Meanwhile, Jython and IronPython don't, as far as I know, have
anything to allow calling into C at all.  Writing it in Python with
metaprogramming to convert it to RPython will make it
platform-independent. Also worth noting is that after seeing the
performance benchmarks for Scala parser combinators with macros and
staging, I'm skeptical that it's necessary to write a parsing library
in a low-level language to get good performance, and that the kind of
code that's fast doesn't change much from interpreter to interpreter:
it's all simple, imperative-style code with obvious optimizations.


\section{Implementation Order}
\label{sec:implementation_order}

\begin{enumerate}
\item GLL combinators.
\item Non-correcting error recovery, for debugging.
\item Tests.
\item Performance benchmarks.
\item Macros, staging, or other metaprogramming for performance.
\item Data-dependent grammar functionality, in some as yet TBD order.
\item Unparsing.
\end{enumerate}


\section{Major Implementation Choices}
\label{sec:implementation_choices}

\subsection{GSS Representation}
\label{sec:gss_representation}

\textcite{afroozeh_izmaylova} improved the performance of GLL with a
different representation of the GSS that places information on the
edges as well as the nodes to avoid creation of duplicate nodes and
moves the sets $\mathcal{U}$ and $\mathcal{P}$ from global hash tables
to local hash tables on the nodes.  This results in some clear
performance gains, and since there's no real cost---their changes only
move information around---and Python doesn't have a native linked-list
data type that would work for the GSS layout described in
\textcite{gll2}, there's no reason not to use their changes.  In their
representation, GSS nodes are labeled with a nonterminal (a class
instance) and an index into the input and have links to their outgoing
edges and two local hash tables associated with $\mathcal{U}$ and
$\mathcal{P}$, the former containing descriptors as a grammar slot
plus an index and the latter containing an index.  GSS edges are
labeled with a grammar slot, i.e. a nonterminal/class instance and an
integer representing an index into a nonterminal.  Each edge has a
link to a GSS node and an SPPF node.


\subsection{Return an iterator over parse trees or a shared packed
  parse forest.}
\label{sec:iterator_sppf}

Spiewak chose the former option for his GLL combinators in Scala, but
it has a couple of problems.  First, it involves freezing the parser
state, which means that starting a parse requires creating a new
parser state.  This has some performance implications, particularly
with respect to memory, and makes cleaning up afterwards more
difficult, probably demanding some kind of context manager to close
running coroutines.  Second, it means that changing the implementation
of the parse forest almost certainly involves changing the
implementation of the parser itself because the parser state is
entangled with its output.  Third, iterating over subtrees as well as
alternatives trees will require nested iterators, i.e. an iterator
that returns another iterator, which gets confusing and requires some
doubly-suspended state.  Grune and Jacobs bring up another problem in
what they call the producer-consumer model: anyone wanting to use the
parse trees will have to analyze multiple trees to figure out how they
differ to figure out what to do with them.  (Note that it's easy to
build an infinite iterator so that method should be able to handle
infinitely-ambiguous grammars.  Avoiding analyzing tree-by-tree will
require users to understand the SPPF and a good API for analyzing it
without generating trees.)  They point out that coroutines can reduce
some of these problems by putting the parser and the consumer of the
parse tree on an equal footing.  Moura and Ierusalimschy (2009,
"Revisiting coroutines") show that asymmetric coroutines are as
powerful as symmetric ones, so it ought to be possible to implement
this in Python, but I suspect the implementation would involve
creating a trampoline that passes data between the producer and
consumer coroutines, and I'm not sure if the added complexity is
worthwhile, especially given it would involve a performance hit.

Thus, I'm running with the SPPF.  Initially, I'm going to implement
Tomita's worst-case unbounded-polynomial version rather than Scott and
Johnstone's binarized worst-case cubic version because binarization
will make the resulting parse forest harder to understand and work
with, and I doubt anything this library will ever process will
approach the worst case because that should only happen with
pathologically-ambiguous grammars.  However, I should make sure to
encapsulate the SPPF so if I need to binarize the implementation later
I don't need to change anything else.


\subsection{Tree and SPPF API}
\label{sec:tree_sppf_api}

With a traditional recursive-descent parser, the definitions of the
nonterminals in the grammar and their corresponding functions provide
natural nodes for the parse tree.  In a parser built from combinators,
however, because the combinators don't intrinsically have nodes, it's
not clear how to build the parse tree.  As far as I can tell,
traditional parser combinators always use the sequence combinator to
build the tree out of nested lists/tuples, either by having it build a
flat list/tuple out of two or more alternatives like Construct's
Sequence (Struct, which makes an ordered map, is a variation on this)
or nested binary lists/tuplesx, like Spiewak's combinators.  Other
non-monadic combinator implementations follow the same general
approach: Hughes in \emph{Generalizing Monads to Arrows} observes,

\begin{quote}
  For example, it is fairly clear that a library for parsing should
  include a combinator to invoke two parsers in sequence, but there
  are many possible ways in which such a combinator might handle the
  two parsers' results.  In some early parsing libraries the two
  results were paired together, in others the sequencing combinator
  took an extra parameter, a function to combine the results.
\end{quote}

The clearest explanation I've found of the relationship between this
traditional sequence combinator and monadic combinators is in Vegard
\O ye's article (2012, \url{https://github.com/epsil/gll}): they
define an operation bind (this is the monadic bind) that takes a
parser and a function, applies the parser to the input, applies the
function to a success to get another parser, and then applies the
resulting parser to the success to get the final output.  They then
define the sequence combinator in terms of a double bind, passing the
list constructor as the function to bind; for a sequence combinator
with more than two elements, they use fold/reduce with list-append to
get a flat list rather than nested lists from one combinator.  The
Hutton and Meijer paper (\emph{Monadic Parser Combinators}) also shows
how to build a sequence combinator using bind and a concatenation
function for lists, in this case specifically to avoid nested tuples.
While the standard approach to defining monads, settled on by Haskell
and seemingly imitated by everyone else, is to define a monad in terms
of return (sometimes called result) and bind, one can also define
monads in terms of three functions, return, join, and map (sometimes
fmap), and then define bind in terms of join and map.

Writing monadic (or arrow-style) parsers in Python would be stupid, so
I need a sequence combinator and will follow this universal (as far as
I know) practice and define the nodes in the tree using it.  I can let
it take a constructor with which to build the parse tree with a
default constructor or expect that standard usage involves
transforming the parse tree with semantic actions.  While monadic bind
combines the choice of parse tree with the sequence combinator, my
understanding is that separating join and map separates the two, so
having a semantic action combinator following a sequence combinator
can simulate any choice of bind for monadic combinators.

Irrespective of the internal representation, I also need to figure out
the interface the SPPF presents to the users.  The SPPF itself is hard
to understand so a direct implementation would make for a bad API.  My
best idea for handling this is to build a tree-centric API where users
interact with the SPPF as if it was a collection of trees.  One
obvious problem after creating the SPPF is how to allow user code to
examine individual trees.  Luckily, this problem has an obvious
solution: instead of returning a copy of an individual tree, I can
return an object that contains references to the correct nodes in the
SPPF.  However, implementing anything more than simple views on the
SPPF is \emph{hard}.  I still haven't come up with a better idea for
the tree API itself than Construct's: because trees are a recursive
data structure, allowing recursive references using Python's []
addressing provides a natural API.

That said, for users to deal with ambiguity in any efficient manner
will probably require users to both understand the SPPF and have
direct access to it; I have no idea how to implement such an API or to
integrate it with the hopefully-simpler tree-centric API.

How semantic actions interact with the SPPF presents some thorny
issues.  Semantic actions are essential, as aside from needing them to
emulate the power of monadic combinators, they're required for
directing parsing with previously-parsed input, essential in many
common parser tasks like ignoring whitespace, one of the better reasons
for using a top-down parser in the first place, and possibly important
in performing disambiguations while the parser is running.  Because
I can't figure out how to transform the SPPF as if it was a collection
of trees, though, without brute force and maybe not even then, I still
haven't settled on how to set semantic actions up.  I have some
incompletely-realized partial ideas:

\begin{itemize}
\item Higher-order functions on trees: filter, map, reduce.
\item A higher-order function that converts a function acting on trees
  to a function acting on SPPFs.  I don't know how to write it, though.
\item The visitor pattern and other kinds of traversals with functions
  acting on the tree/SPPF.
\end{itemize}


\subsection{Tree and SPPF representations}
\label{sec:tree_sppf_representations}

Traditional parser combinators don't provide any natural node labeling
because of the absence of labeled nonterminals.  A tree representation
based on high-level sequences and ordered mappings, in Python
lists/tuples and OrderedDicts, doesn't need these node labels, and
it's possible to implement combinators that return trees or an
iterator over trees without them.  However, for a shared packed
forest, labels are required for implementing subtree sharing because
without them, combinators in different branches of the parse traversal
have no way to know when they've generated the same subtree.
Moreover, GLL needs labels to merge different branches of the parse in
the GSS, so while it's possible to omit node labels altogether in a
traditional recursive-descent parser, any GLL parser needs labels.
There's an natural labeling scheme for parse trees based on
partitioning the input that Scott and Johnstone use but never
explicitly explain.  Each terminal or uninterrupted sequence of
terminals has a starting and ending offset so the corresponding leaf
node can be labeled with (terminals, starting\_offset,
ending\_offset).  The leaf nodes read in offset order must cover the
entire input, and in fact reading them in order in the absence of
semantic actions will return the original input.  The rest of the tree
nodes represent partitions further subdivided into subpartitions
labeled with nonterminals, with the root node corresponding to
(start\_symbol, 0, length(input) - 1) (zero-indexed).  Because parser
combinators don't have nonterminal labels, Spiewak uses parser
identity instead.  Like Spiewak, I intend to use parser combinator
class instances as node labels in GLL.  Note that generator objects
can't be used for this, since a new generator object is created every
time a generator function is called, and neither can the method or
code objects, because they're shared by all instances of the same
parser combinator class.

Both trees and forests are fundamentally directed graphs, normally
acyclic but in the case of infinitely-ambiguous grammars cyclic.  In
Python, one possible graph representation is a dict of lists, where
the lists contain the labels of other nodes.  By using the labels
discussed above as the labels for the nodes, it's possible to
represent a parse tree as a digraph where the values of leaf nodes are
Python objects representing terminals and the values of non-leaf nodes
are lists of either pairs of integers, the direct labels for other
nodes, or single integers representing the subpartitions for that
node.  Eliding the terminal and nonterminal symbols for clarity, for
instance:

\begin{lstlisting}
(1, 8) : [(1, 2), (2, 5), (5, 8)]
(1, 8) : [2, 5]
\end{lstlisting}

The superfluous integers correspond to the "repeated dimensions" that
Johnstone and Scott describe in \emph{Modelling GLL Parser
  Implementations} (2010).  The latter representation obviously takes
less memory but node operations will be slower because correct
offset-pairs will have to be generated from the partitions when
they're needed.

The SPPF can have almost an identical representation to the parse
trees themselves because it's also fundamentally a digraph.  There are
only two differences: nodes can have more than one parent,
corresponding to subtree sharing, and there has to be a way to
distinguish packed nodes from normal nodes because they have different
meanings.  In the binarized SPPF, Johnstone and Scott define normal
nodes using two offsets, called "left extent" and "right extent", and
packed nodes using a single integer, called "pivot."  In the
partition representation any node with only two children can be
represented with a single integer, the division between the subtrees.
A packed node is the result of combining at least two nodes, so a
minimal binarized packed node looks like:

Original nodes:
\begin{lstlisting}
(0, 4): [(0, 1), (1, 4)]
(0, 4): [(0, 3), (3, 4)]
\end{lstlisting}

Packed node:
\begin{lstlisting}
(0, 4): [((0, 3), (3, 4)), ((0, 1), (1, 4))]
\end{lstlisting}

Binarized nodes:
\begin{lstlisting}
(0, 4): Packed(1), Packed(3)
Packed(1): [(0, 1), (1, 4)]
Packed(3): [(0, 3), (0, 4)]
\end{lstlisting}

However, I don't understand the relationship between Scott and
Johnstone's partition model and parses that don't consume all the
input.  The natural way to implement nondeterminism leads to returning
all incomplete parses as well as the complete parses.  Both Spiewak
and Hutton and Meijer mention this natural ambiguity.

\begin{quote}
  It is also worth noting that we do not allow \texttt{Success}(es)
  which have failed to consume the entire input stream.  We will
  actually receive a \texttt{Success} every time a \texttt{Parser}
  reduces successfully. While this is technically correct (reflecting
  the ambiguity between greedy and non-greedy matching), it is almost
  never the desired semantics.  (Spiewak)
\end{quote}

Scott and Johnstone's constructions of parsers from recognizers seem
to rely on greedy matching, because their partitioning scheme only
works with matches of the full input.  They also restrict packing to
nodes that share the same partition, ``Nodes can be packed only if
their yields correspond to the same portion of the input string''
(Scott, \emph{SPPF-Style Parsing from Earley Recognisers}).  This
doesn't seem to work in cases with partial/non-greedy matching because
every nontrivial node will have partial matches that don't cover the
same partition.

A further observation that may or may not be related is that their
node labeling scheme seems to include unneeded information.  A given
parser started at the same position should always produce the same
output, so all that's needed for a unique node label is (parser label,
starting index into the input).  In fact, in one of their early papers
(\emph{BRNGLR: a cubic Tomita-style GLR parsing algorithm}), they use
this labeling scheme, rather than the later three-element labels.  I
don't understand why, and if or how this might be connected with
greedy matching.  A final note on the greedy-matching issue is
that the obvious implementation of nondeterminism applied naively
leads to the wrong recognizer, because a partial match will report
success even if the whole string is not matched.

In the aforementioned paper, they use a table-based representation
for all the data structures, including the SPPF.  Following the data
structure refinement process outlined there, the declarations for the
SPPF are:

\begin{verbatim}
SPPFSymbolNode (SPPFLabel s, Int leftExtent, Int rightExtent)

SPPFPackNode (SPPFLabel s, Int pivot)

SPPF (SPPFSymbolNode symbolNode, SPPFPackNode packNode, SPPFSymbolNode left, SPPFSymbolNode right)

SPPF sppf
\end{verbatim}

Substituting:

\begin{verbatim}
sppf ((*symbolNode) SPPFLabel, Int, Int)
      (*packNode) SPPFLabel, Int)
      (*left) SPPFLabel, Int, Int)
      (*right) SPPFLabel, Int, Int)
     )
\end{verbatim}

The left extent for the left node is the same as parent's, the left
node's right extent is the same as the left extent of the right node,
and the right extent for the right node is the same as the parent's.
The SPPFLabel for the pack node and the parent node are also the same.

\begin{verbatim}
sppf ((*symbolNode) SPPFLabel, Int, Int)
      (*packNode) Int)
      (*left) SPPFLabel, Int)
      (*right) SPPFLabel)
     )
\end{verbatim}

This still seems like too much: it's a seven-dimensional table with
four dimensions indexed by integers, which as they note in the paper
is too large.

The clearly-fastest implementation with the best encapsulation is to
use Cython to write a specialized SPPF object, with the exact methods
needed for building it, with an appropriate internal representation,
possibly based on the ideas from the \emph{Modeling GLL
  Implementations} refinement discussed above.  All other approaches
will require methods implemented in Python (which will be slow) to do
the necessary transformations.  To achieve anywhere near acceptable
performance, I have to use native Python data structures or C/C++
extensions.  Any encapsulation has a performance overhead because I
won't be able to use comprehensions to speed up object creation, but
beyond that full encapsulation from the parsers will cost more speed
because they will have to call methods in Python.  On the whole, I'm
convinced enough of the advantages of encapsulation, particularly the
possible need to rewrite the whole thing in Cython, that the first
version will involve partial encapsulation with methods written in
Python for the user API and hard-coding the parsers.  On that point,
there's very little reason, as far as I can tell, to expose the
internal node labels to the user.

There's one possible other representation of the SPPF about which I
know little called parse-forest grammars, introduced on pp. 91-93 of
Grune and Jacobs.  They have production rules labeled almost
identically to Johnstone and Scott's SPPF, a nonterminal, a starting
position, and a length (which is isomorphic to using the ending
position).  I personally don't see how any of the supposed advantages
Grune and Jacobs list for them are good for writing practical parsers
and the API they'd provide would be very unintuitive for people used
to conventional parsers, but they're another possible internal
representation.


\subsection{Immutability vs. mutability}
\label{sec:immutable_mutable}

There are two different approaches to the data flow through the
parser, the functional-immutable style where functions, or in this
case, coroutines, create objects and return them and the mutable-state
style where a mutable object is passed into a function or coroutine,
which then mutates it and returns nothing.  The choice of which style
to use can be made on a per-object basis, but because in most
languages including Python functions can only return one object, and
in the case of Python coroutines can only yield and accept through
send() one object, functions that need to return more than one object
have to aggregate all the objects they need to return into one object.
Handling the aggregation and disaggregation for parsers is an example
of the monadic pattern, as even recognizer combinators need to return
a Boolean representing whether a given input is in the language
generated by a grammar and also a stream object representing the
truncation of its input.  Real parsers always need to return the same
Boolean, a stream, and a parse tree.  Other patterns are possible:
Construct uses a Python stream, which is a mutable object, and thus
doesn't need to return a stream.  An empty parse tree or some kind of
null result (None in Python) can represent failure, but this is a bad
idea since it provides no information about where the failure occurred
or what happened.

In traditional object-oriented recursive-descent parsers, the whole
parser would be one class and mutable objects could be handled as
shared state.  As it is, since my combinators are classes, I have to
pass mutable objects instead.  This isn't the worst thing in the world
since passing objects will avoid self look-ups, helping performance,
and is explicit rather than implicit.  Several of the implementations
I've looked at do similar things.  Spiewak's GLL combinators use a
functional-immutable style for the tree, returning a list in the
simplified implementation and a stream/iterator in the real
implementation, but an OO class for the trampoline.  Scott and
Johnstone's example GLL implementation uses an OO class for the parser
with a mutable shared SPPF.  Jim and Mandelbaum's transducers for
data-dependent grammars also use a mutable SPPF.

For GLL, the combinators could potentially return success/failure, the
GSS, the SPPF, the stream, and the pointer.  Mutable objects only need
to be passed in when starting a new coroutine instance and don't need
to be yielded or passed via send() because every coroutine will
already have access to them.  The advantage of mutability is speed,
and its main problem is that it always has more potential for creating
hard-to-isolate bugs.  Unfortunately, Python is simply not designed
for the functional-immutable style and obviously has no optimizations
in the compiler/interpreter for handling immutable object creation for
non-built-ins.  Thus, the functional-immutable style with an
encapsulated Python object will be many times over too slow.
(\url{http://www.valuedlessons.com/2008/03/why-are-my-monads-so-slow.html}
gives some numbers suggesting how much too slow.)

For the SPPF, the only way to implement sharing in a functional
fashion is to pass a memoized cache and discard it after building the
SPPF.  Otherwise, combinators in different branches of the parse won't
be able to share subtrees.  However, a memoized cache itself is most
of the way to an SPPF, so there's no real reason to use the former in
place of the latter.  The SPPF is so large (I'm estimating a factor of
>100 times over the input, and that's optimistic) that recreating it
in every parser is wildly impractical, so it has to be mutable.
However, there's a further problem with building the SPPF as a mutable
object passed among combinators that Jim and Mandelbaum describe in
\emph{Delayed Semantic Actions in Yakker}: nodes that get added to the
SPPF during branches of the parse that eventually dead-end will
continue to exist in the final SPPF.  The solution they use in OCaml
is to use a weak hash table.  The best option in Python is similarly
to use a weak dictionary.  This creates several knock-on effects.
First, strong references to the nodes in the SPPF have to be stored in
the SPPF \emph{somewhere} or else the whole structure will get
garbage-collected, and the only reasonable place to put them is in the
nodes themselves, thus a weak dictionary of objects that hold
references to other nodes.  Second, most Python built-in types and
subclasses thereof are not weak-referencable in CPython.  Subclasses
of lists and dicts can be weak-referenced, while lists and dicts
themselves can't.  (None of this behavior is defined for all
implementations, though PyPy seems to follow CPython here.)  Changing
from subclasses of tuple to list has memory and speed penalties, but
the alternative to using Python's weakref module is doing manual
object destruction myself, and that's just awful.  Third, to keep
nodes alive while building the SPPF I need strong references outside
it, which means the combinators need to return strong references.
Fourth, terminals have to be enclosed in some kind of
weak-referencable proxy object.

That said, while the SPPF has to mutable, the nodes making it up can
be immutable.  There are advantages and disadvantages to both
approaches.

\begin{itemize}
\item Mutable nodes make it easier to transform trees and reduce the
  memory overhead of transformation operations since they avoid
  copying.  How common is it to need to transform a parse tree,
  though?  For writing a compiler/interpreter or a converter for a
  binary data format, for instance, the parse tree is used to build
  another kind of object, not mutated in-place.
\item Immutable nodes prevent a broad class of potentially
  hard-to-find bugs related to mutating a tree in one node in an SPPF
  and causing changes that propagate to other trees incorrectly.  This
  is a particular problem with semantic actions where users shouldn't
  need to understand the particulars of the parsers or the SPPF.
\item Immutable objects could provide memory advantages, but Python's
  implementation makes this difficult.  There's no built-in immutable
  mapping type, and tuples can't be weak-referenced.  To get the
  memory savings, I'd either have to give up speed by using
  composition and methods implemented in Python to enclose tuples in a
  weak-referencerable object or use Cython to implement my own types
  in C.
\item Immutable containers are hashable, which is essential for any
  kind of elegant way of handling which packed nodes to traverse in a
  tree view of an SPPF and useful for implementing packed nodes as
  frozensets, gaining the automatic ability to avoid duplicates when
  packing.
\end{itemize}

Spiewak passes the trampoline that contains the GSS as a mutable
object for ``convenience and efficiency,'' and I'm leaning towards
following his lead.  For parsing, I'm passing the stream as an
immutable object.  That leaves only success/failure, the pointer, and
a node reference as immutable objects I need to create in each combinator.

\begin{itemize}
\item GSS: mutable, passed
\item SPPF: mutable, passed
\item Stream: immutable, passed
\item Success/Failure: immutable, returned
\item Stream pointer: immutable, returned
\item Node: immutable, stored in SPPF and returned
\end{itemize}

There is one other possible way to circumvent the bad performance of
the functional-immutable style, which is to use compilation/code
generation to eliminate the intermediate data structures.  This may
turn out to be the best option, and in the end is almost certain to be
the fastest, especially if I go all the way to a two-stage compiler
that compiles Python to Cython and Cython to C.  At the moment, I
don't understand it well enough to enumerate its advantages and
disadvantages.


\subsection{Outer-Loop Trampoline}
\label{sec:trampoline}

In Python, the trampoline has to run as the outermost loop and call
the coroutines because Python coroutines are asymmetric, they can only
return control to their callers with yield.  If I instead try to pass
the trampoline through the combinators, there's no way for them to
pass control to the trampoline.  Conveniently, GLL is organized with a
dispatch function such that every parser returns control to it after
finishing execution, which is my trampoline.


\subsection{Yield from}
\label{sec:yield_from}

The main problem with "yield from" is that it's only available on
Python 3.  Also, experimentation suggests that as of 3.4, using "yield
from" still hits the maximum recursion depth which means that "yield
from" still adds a stack frame for each call, and Python has a small
stack limit.  On the other hand, without "yield from", Python's
coroutines are not stackful (see Moura 2009, "Revisiting Coroutines"),
which means they may not be as powerful as one-shot continuations and
might not be powerful enough for GLL.  That said, I know that "yield
from" is internally implemented with a trampoline, and the existence
of Eby's "fiendishly clever" trampoline implementation for Python 2
suggests it ought to be possible to use a trampoline to convert Python
2's stackless coroutines into stackful coroutines by manually
implementing a stack.  I also might end up needing a trampoline
anyways to handle the graph-structured stack, in which case the
advantage offered by "yield from" may be diminished.

My tentative analysis is that "yield from" simply isn't useful for
GLL.  GLL's dispatch stack contains information about the GSS and SPPF
as well as which parser needs to resume control, so replacing the
stack with "yield from" isn't going to get me very far since I'll
still need another stack to hold the GSS and SPPF information, even if
it's possible to handle the dispatching without that added
information.  I suspect that it's either impossible to use "yield
from" with GLL or that the added complexity from trying to integrate
the two would completely negate any performance or simplicity gains.
This, combined with the lack of "yield from" in 2.7, means I'm not
going to try to use it.


\subsection{Indices versus Slicing}
\label{sec:indices_slicing}

With Python 2.7+'s memoryviews, it's possible to slice the stream
without copying, embedding slices in the parse forests passed between
combinators.  The main disadvantage of this approach is that it means
the combinators can't handle text, particularly Unicode, since there's
no equivalent for Python strings.  There are also three other factors
to consider: all of the Python library functions (string methods,
struct, re, and so on) take an optional argument that's a starting
index anyways, there's no equivalent stringview object (I'd have to
write one), and indices provide natural labels for the GSS and SPPF
(and memoryviews are only hashable in 3.3+, so they can't take over
that role in 2.7).  Primarily because of the last two considerations,
I'm going with indices.

To my surprise, when I profiled indices versus string slicing
directly, string slicing was slower than indices on both CPython and
PyPy 3, though as expected the indexed version consumed much, much
less memory on CPython.  The extra time seemed to be spent in the
hash-heavy functions, so this might have something to do with
CPython's optimizations for hash tables with string-only keys.  I
still don't understand why the cost of additional allocations didn't
overwhelm that, but this is why we profile.  However, the memory
consumption issue and aforementioned considerations mean I'm sticking
with indices.


\subsection{Module for handling bits}
\label{sec:bits_module}

The main module implemented in C for bit arrays/bit vectors is
\href{https://github.com/ilanschnell/bitarray}{bitarray}
(\href{https://pypi.python.org/pypi/bitarray/}{PyPi}).  I don't know
how hard it will be to make this work with PyPy, but I'm going to give
it a shot.  The main two pure-Python implementations are
\href{https://engineering.purdue.edu/kak/dist/BitVector-3.3.2.html}{BitVector}
(\href{https://pypi.python.org/pypi/BitVector/3.3.2}{PyPi}) and
\href{https://code.google.com/p/python-bitstring/}{bitstring}
(\href{https://pypi.python.org/pypi/bitstring/3.1.3}{PyPi}).  For the
prototype, I'm going with bitstring but will return to bitarray once I
look at performance optimizations.

\subsubsection{bitarray}
\label{sec:bitarray}

This ought to be the fastest because it's implemented in C, but by the
same token, it will probably pose the biggest compatibility problems,
particularly with PyPy.  The API is clean and mimics the standard
sequence types, but has the major disadvantage of having no offset
option in any of the constructors, which would require copying without
using memoryview, and some imperfections in the handling of
conversions (it has a tostring() method, but I don't know if calling
str() works, for instance).  There's no immutable bitarray type, which
would mean that I'd have to make my own.  Bitarray is packaged for
Ubuntu.

\subsubsection{bitstring}
\label{sec:bitstring}

Implemented in pure Python, this ought to be compatible with
everything.  There's also a Cython version, though from looking at it
doesn't seem to have many optimizations over the pure-Python version.
The API includes mutable and immutable types and supports reading from
at a binary object at an arbitrary offset, though obviously it doesn't
support the buffer protocol, and works the way it should with respect
to functions like str().  On the whole, it has clearly the best API.

\subsubsection{BitVector}
\label{sec:bitvector}

Also implemented in pure Python, it ought to have similar
compatibility to bitstring.  The API has the same problem as
bitarray's, no reading binary data at an offset, with the additional
disadvantage of not supporting the buffer protocol.  Moreover, the API
as a whole doesn't resemble that of the standard sequence types.  The
only advantage of this module is that it has built-in methods for
certain kinds of advanced operations on bit arrays, but that's not
that important.


\subsection{Operator Overloading Dispatch for the Combinators}
\label{sec:operator_overloading_dispatch_combinators}

Operator overloading requires type dispatch: even the simplest
possible operator overload has to distinguish between types it accepts
and those it doesn't.  Setting up operator overloading for the
combinators requires more complicated type dispatch.  I wanted to try
to do this with \texttt{functools.singledispatch()} but the
implementation details of Python make this a bad solution.  When
applying the decorator to a function defined in a class body, the
function will always receive as its first argument the instance
calling it.  The author of singledispatch suggests
(\href{http://lukasz.langa.pl/8/single-dispatch-generic-functions/}{What
  single-dispatch generic functions mean for you}) using the decorator
on a function after it's been bound as a method to circumvent this:

\begin{lstlisting}
class C:
    def __init__(self):
        self.method = functools.singledispatch(self.method)
        self.method.register(Type, self.implementation)
\end{lstlisting}

A function wrapping a method only receives the arguments that
\emph{aren't} the instance, so it's possible to dispatch usefully on
its first argument.

\begin{lstlisting}
>>> def f(func):
...   def g(*args):
...     print(args)
...     func()
...   return g
... 
>>> class C:
...   def __init__(self):
...     self.method = f(self.method)
...   def method(self):
...     pass
... 
>>> c = C()
>>> c.method(1, 2, 3)
(1, 2, 3)
\end{lstlisting}

This direct approach doesn't work for operator overloading, however,
because operators are special methods and special methods are looked
up on the class, not the instance: ``For custom classes, implicit
invocations of special methods are only guaranteed to work correctly
if defined on an object's type, not in the object's instance
dictionary''
(\url{https://docs.python.org/3/reference/datamodel.html#special-method-lookup}).

The way to circumvent this is to have the special method delegate to a
normal method and then wrap that method with
\texttt{functools.singledispatch()}:

\begin{lstlisting}
  class AbstractBase:
    def __init__(self) -> None:
        self._op = functools.singledispatch(self._op)
        self._rop = functools.singledispatch(self._rop)

    def __op__(self, other: 'AbstractBase') -> 'AbstractBase':
        return self._op(other)
    def _op(self, other: 'AbstractBase') -> 'AbstractBase':
        return NotImplemented

    def __rop__(self, other: 'AbstractBase') -> 'AbstractBase':
        return self._rop(other)
    def _rop(self, other: 'AbstractBase') -> 'AbstractBase':
        return NotImplemented
\end{lstlisting}

To avoid code duplication for noncommutative operations, however,
there has to be another call to a function/static method that supports
having its arguments reversed.  The following are two classes that
together implement a kind of set of sets monoid with the dispatching
necessary for the operations.  The key observation is that almost all
the work is being done in the staticmethods
\texttt{\_op\_collection\_collection()} and \texttt{\_op\_atom\_atom()}.
The \texttt{\_op\_atom()} and \texttt{\_rop\_atom()} methods on
\texttt{Collection} need to set up calls to \texttt{Collection}
(written as \texttt{type(self)(other)} to avoid hard-coding) and
\texttt{\_op\_collection\_collection()} correctly, and one can argue if
there's content here.  The vast majority of this code is boilerplate,
though.

\begin{lstlisting}
class Collection(AbstractBase):
    def __init__(self, collection: 'Any', *args: 'Atoms') -> None:
        super().__init__()
        self._op.register(Collection, self._op_collection)
        self._op.register(Atom, self._op_atom)
        self._op.register(Collection, self._op_collection)
        self._rop.register(Atom, self._rop_atom)
        self._rop.register(Collection, self._rop_collection)
        self._collection = collection(*args)

    def _op_collection(self, other: 'Collection') -> 'Collection':
        return self._op_collection_collection(self, other)
    def _rop_collection(self, other: 'Collection') -> 'Collection':
        return self._op_collection_collection(other, self)
    def _op_atom(self, other: 'Atom') -> 'Collection':
        return self._op_collection_collection(self, type(self)(other))
    def _rop_atom(self, other: 'Atom') -> 'Collection':
        return self._op_collection_collection(type(self)(other), self)

    @staticmethod
    def _op_collection_collection(left: 'Collection', right:
    'Collection') -> 'Collection':
        return self._collection.op(right._collection)
\end{lstlisting}

\begin{lstlisting}
class Atom(AbstractBase):
    def __init__(self) -> None:
        super().__init__()
        self._op.register(Atom, self._op_atom)
        self._rop.register(Atom, self._rop_atom)

    def _op_atom(self, other: 'Atom') -> 'Collection':
        return self._op_atom_atom(self, other)
    def _rop_atom(self, other: 'Atom') -> 'Collection':
        return self._op_atom_atom(other, self)

    @staticmethod
    def _op_atom_atom(left: 'Atom', right: 'Atom') -> '(Atom, Collection)':
        if oppable(left, right):
            return left.op(right) # -> Atom
        else:
            return Collection(left, right)
\end{lstlisting}

The reason the boilerplate is necessary is that
\texttt{functools.singledispatch()} must dispatch to another method,
since it only received a single argument, with its first argument
being bound implicitly to the instance.  However, the functions that
do the work are generic and need to be functions, not methods, so the
method that is dispatched to needs to explicitly call them with
\texttt{self, other} or \texttt{other, self}.  This is a total of four
function calls, say \texttt{\_\_op\_\_()}, \texttt{\_op()},
\texttt{\_op\_atom()}, and \texttt{\_op\_atom\_atom()}.

Metaprogramming with dynamic class generation can remove the need to
write a lot of the boilerplate and some of the code duplication.
Scary metaprogramming with AST rewriting can, for instance, be used to
turn a method into its reversed form by reversing the variables, which
can eliminate more code duplication and some of the performance
penalty that comes from turning one function call into four function
calls.  However, most of this is a self-created problem relating to
the limitations of \texttt{functools.singledispatch()} for methods,
special methods in particular and using metaprogramming to solve it is
stupid.  Moreover, even with metaprogramming, there's a nontrivial
performance hit.  Worse, it will make the implementation complex and
hard to understand and maintain, and I don't know if it's even
possible to keep the metaprogramming completely closed off from the
rest of the API.  Another, much less significant problem with
\texttt{functools.singledispatch()} is that it doesn't integrate with
function annotations like it should (probably for backwards
compatibility), which means refactoring to come if I use it.

There's nothing wrong with the theory of emulating double dispatch
with two levels of single dispatch, it just doesn't work with
\texttt{functools.singledispatch()'s} implementation.  There's a long
list of multiple-dispatch implementations for Python (including
\url{https://pypi.python.org/pypi/multimethods},
\url{https://pypi.python.org/pypi/generic},
\url{https://pypi.python.org/pypi/multipledispatch/},
\url{https://github.com/morepath/reg}, an @overload decorator in
\href{http://mypy-lang.org/}{Mypy} that was removed during the
discussions about type-hinting for Python 3.5, and others that are
older) that could handle the necessary dispatch, but they're all
varying degrees of unsafe (most of them use frame inspection) and
overcomplicated.  The latter is a problem because multiple dispatch is
overkill for this problem, since the type of \texttt{self} for any
given special method is known at compile time.  Moreover, all the
implementations differ in both syntax and important details, so
there's not anything close to a standard.  Guido van Rossum claims to
be looking at multiple dispatch again for the future, but who knows if
that's going to come to pass.  The lack of standardization and
possible upcoming changes to singledispatch and the typing system in
general and the possibility of real multiple dispatch in the standard
library make any solution not very future-proof.

The central problem with dispatch for special methods is that at class
creation, a special method doesn't know what instances will be
created, but it has to be capable of dispatching to instance methods
to use object-oriented method dispatch.  Thus, the special method can
only dispatch to an instance method when it's called, when it has
access to the instance via its first argument.  By having the special
method itself do the dispatching, rather than a wrapper function, and
dispatching on the type of the \emph{second} argument, I can avoid
these problems and use a form of chained single-dispatch to do the
necessary double dispatch.  The best solution would probably be to use
the same dispatching for subtypes that
\texttt{functools.singledispatch()} uses.  However, the code for
\texttt{functools.singledispatch()} isn't intended to be used outside
its module, with two functions created inside the
\texttt{functools.singledispatch()} function itself.  Thus, I instead
implemented my own dispatching decorator in the simplest, dumbest
possible way with no dispatch to supertypes at all.


\section{Notes}
\label{sec:notes}

\subsection{Type Checking}
\label{sec:type_checking}

Ideally combinators should fail as soon as someone attempts to combine
combinators that process data of different types (Unicode/text and
binary data, at the moment) or parse data of an inappropriate type for
a given combinator, rather than failing deep into parsing.
Unfortunately, two things make this difficult: mutually recursive
functions create cycles in the digraph through which types propagate,
and lazy evaluation of combinators makes it hard to analyze types at
class instantiation time.  A correct solution to this problem is
almost certainly going to involve using one of the existing
type-checking/type-inference algorithms, and it might make sense to
outsource it to something like mypy.  Arguably, it isn't in the spirit
of Python's dynamic type system since it involves a form of static
analysis, though the addition of gradual typing to 3.5 undermines this
argument.

\subsection{Yakker SPPF}
\label{sec:yakker_sppf}

The major differences between my current implementation and the Yakker
implementation are:

\begin{itemize}
\item The non-packed nodes in their tree are binarized.
\item Nodes embed both packing and the normal branching in the same
  data structure.  Each node contains a list of pairs of children.
\item Every node contains, in addition to its children, an arbitrary
  semantic value and a label.
\item Both have a weak hash table containing links to all the partial
  trees in the forest.  However, their implementation seems to use the
  hash table like a set, with a function that checks directly if a
  partial tree is in the table and returning it if so.
\end{itemize}

\subsection{Parse-forest grammars and adjacency-list digraphs}
\label{sec:parse-forest_grammars_adjacency-list_digraphs}

My original representation of trees/forests as digraphs using
adjacency lists containing node labels (rather than node references)
is very close (I haven't done the detailed analysis to determine how
close) to a parse-forest grammar.  I don't know if there are uses for
this interpretation, but it's something to keep in mind especially
when considering code generation and error handling.

\subsection{Arrow combinators}
\label{sec:arrow_combinators}

Hughes notes that any arrow that supports \emph{apply} is, in effect,
a monad.  He makes the further observation that \emph{apply} and the
choice operator can't be defined for the kind of parser that Swiestra
and Duponcheel discuss and states, ``Luckily, this does not matter: it
is rare that we \emph{want} to write a parser which decides on the
grammar to accept on the basis of previously parsed values.''  Of
course, since this is exactly what I'm trying to do, using arrows here
would be counterproductive.  What this means exactly for integrating
any of Swierstra and Duponcheel's ideas into my combinators I'm not
sure: is there some kind of model that can incorporate both the
necessary dependence on past parse results and the separation of
static and dynamic elements in arrows?  I don't know.

\subsection{Tunnel / Monadic Bind?}
\label{sec:tunnel_bind}

Is there a connection?  I actually think I can implement Tunnel with
Act/>>.

\printbibliography

\end{document}