-
Notifications
You must be signed in to change notification settings - Fork 2
Coarse to Fine Parsing
Coarse-to-Fine Parsing (CTF) is a pruning technique which can lead to significant increases in parsing speed. The core class for the implementation of CTF in Alto are de.up.ling.irtg.automata.coarse_to_fine.CoarseToFineParser and de.up.ling.irtg.automata.coarse_to_fine.FineToCoarseMapping. The former implements the the parsing algorithm, which parses an input with a small, "coarsified" grammar and then throws away all rules that are below a certain quality threshold and expands the remaining rules to their finer versions. This can be repeated for an arbitrary number of steps - see the paper linked above for more details. The FineToCoarseMapping expresses how nonterminals in a grammar are mapped to increasingly more coarse versions.
A CoarseToFineParser instance always uses a fixed IRTG, FineToCoarseMapping, input interpretation and threshold. The IRTG is the grammar on which parsing is ultimately based, it also provides the homomorphisms and algebras used. The FineToCoarseMapping defines how fine grained nonterminals correspond to more coarse grained nonterminals. CTF parsing always uses an input from a single, fixed interpretation of the IRTG. Finally the threshold controls how aggressively pruning is done. The threshold is chosen from the range (0,1) and a threshold of 1 corresponds to pruning all rules and 0 corresponds to pruning no rules. A rule is pruned and not replaced by finer versions if (its outside weight x it's inside weight x the rule weight) is less than or equal to (threshold x the weight of all trees) on a given CTF level.
The simplest way to use a CoarseToFineParser is to call it's parse method on a string. The CoarseToFineParser will try to parse the string into an object for the input algebra and then apply CTF parsing. The returned result will be a tree automaton representing the pruned, final parse chart. It is possible to parse objects directly using the parseObject and parseObjectSF methods. The former uses condensed intersection algorithms while the latter uses the sibling finder technique. condensed intersection is more suitable for grammars in which many rules have the same, simple homomorphic image, such as WSJ grammars. The sibling finder will be useful when homomorphic images are more complex.
A FineToCoarseMapping specifies how fine nonterminals/states are resolved to more coarse ones. It must specify the number of levels (including the finest level, expressed by the original grammar) and for each fine grained nonterminal a more coarse grained version. The latter is specified by an operation which accepts a nonterminal and maps it to another nonterminal - if a nonterminal is not to be coarsened, then it can be mapped to itself.
There is a convenience method in CoarseToFineParser called makeCoarseToFineParser, which will construct a full CoarseToFineParser using a string representation of a coarse to fine mapping. This representation takes the form:
__(
coarse_nonterminal1 (
fine_nonterminal1_1 (
even_finer_nonterminal1_1_1(
.....
)
even_finer_nonterminal1_1_2(
.....
)
)
fine_nonterminal1_2 (
....
)
coarse_nonterminal2 (
...
)
)
Here "__" is a dummy symbol which is ignored and only serves to ensure that the mapping can be expressed as a tree with the finest nonterminals at the leaves. Nonterminals that do not occur in the tree are mapped to themselves at all levels.