-
Notifications
You must be signed in to change notification settings - Fork 2
Codecs
You can use the classes of the package de.up.ling.irtg.codec to read and write various objects from and to a variety of file formats. An input codec will read a string representation of some object from a file (or some other input stream) and return the object, whereas an output codec will encode an object as a string representation and write it to a file (or some other output stream).
You can convert an entire corpus from one codec format to another using the CodecConverter script.
You can add your own input and output codecs to Alto by extending the abstract base classes InputCodec and OutputCodec, respectively, and putting your classes on the classpath.
For reference, here are the codecs that are currently defined in Alto. Each line lists the class defining the codec, the kind of object that this codec will encode or decode, and a short description.
Codec class | Object type | Description |
---|---|---|
IrtgInputCodec | InterpretedTreeAutomaton | Standard input codec for IRTGs |
PcfgIrtgInputCodec | InterpretedTreeAutomaton | Reads a PCFG as an IRTG |
NltkPcfgInputCodec | InterpretedTreeAutomaton | Reads a PCFG in NLTK format as an IRTG |
BolinasHrgInputCodec | InterpretedTreeAutomaton | Reads HRG grammars for the Bolinas parser as IRTGs |
TemplateIrtgInputCodec | TemplateInterpretedTreeAutomaton | Template IRTG |
TreeAutomatonInputCodec | TreeAutomaton | Standard input codec for tree automata |
TiburonTreeAutomatonInputCodec | TreeAutomaton | Reads a tree automaton in Tiburon format |
BottomUpTreeAutomatonInputCodec | TreeAutomaton | Reads bottom-up tree automata (Hanneforth style) |
IsiAmrInputCodec | SGraph | Reads graphs in the ISI AMR-Bank format |
PtbTreeInputCodec | Tree | Reads trees in Penn Treebank format |
There are fewer output codecs than input codecs, because many classes (such as InterpretedTreeAutomaton and TreeAutomaton) simply have toString methods that Alto calls to generate string representations for such objects. Furthermore, most input codecs convert grammars of various formalisms into IRTGs, and this is much easier than converting IRTGs back into the other formalisms. The output codecs below are mostly useful when several useful string representations are available for the same class of objects.
Note that you can right-click on any value in the Alto GUI (i.e., the contents of the "value" panel in the derivation view) to display a context menu which lets you copy a string representation of that value to the clipboard. The context menu will contain all output codecs that are suitable for the type of that value.
Codec class | Object type | Description |
---|---|---|
BolinasGraphOutputCodec | SGraph | Writes graphs in a format that Bolinas can read |
SgraphAmrOutputCodec | SGraph | Writes graphs in the ISI AMR-Bank format |
TikzSgraphOutputCodec | SGraph | Encodes graphs as Latex graph-drawing code |
TikzQtreeOutputCodec | Tree | Encodes trees as Latex tree-drawing code |