Skip to content

Parsing

cteichmann edited this page Jul 10, 2015 · 6 revisions

Parsing

There are three techniques for parsing a single observation.

GUI

Load an IRTG via the file dialog. Then use the "Parse" option under tools in the grammar window. This should bring up a window that allows you to type in the input that you want parsed - one per algebra. If you do not want to specify the input for a given algebra, then simply leave the field blank. How the different types of algebra values are written down is specified on the (codec page)[Codec].

parseWindow.png

Once the input has been parsed a (tree automaton window)[TreeAutomatonWindow] for a tree automaton that contains all the parse trees will open.

Code

Assume that you have your input objects given as a list of strings and an IRTG called "irtg". Convert the string inputs into the actual objects that the underlying algebras can understand. You can achieve this for each input string "s" by calling:

#!java

Object actualInputObject  = irtg.parseString(interpretationName,inputString);

"interpretationName" is the name of interpretation that contains the algebra which is used to read the inputString, according to the formats explained on the (codec page)[Codec]. The interpretation name must be known to the IRTG you are using. You can then put the objects into a map "representations" from "interpreationName"s to "actualInputObject"s. Finally you can parse this input as follows:

#!java

TreeAutomaton parseChart = irtg.parse(representations);

Shell

Bulk Parsing

Often you will want to parse a whole list of inputs. For this there are options for parsing a whole collection of data.

GUI

Load an IRTG via the file dialogue. There is a "Bulk Parse" option under the "Tools" dialogue. If you choose this option then you will be asked to select a file that is written according to the corpus codec. Once you have chosen a corpus, you are then asked to select a file in which the parsing results are stored. Once bulk parsing is finished, the target file will contain a corpus in which a parse is associated with each corpus entry. This will be (one of) the highest weight parse(s).

Clone this wiki locally