This project collects a number of UIMA components to process dramatic texts. We follow general design ideas implemented in dkpro.
The module de.unistuttgart.ims.drama.examples
contains several classes with main methods to illustrate how to use the components.
Currently, the following components are provided
UIMA annotation types for dramatic texts.
Components that allow processing of dramatic texts. We make use of standard dkpro components for specific portions of the dramatic texts (e.g., figure speech).
DramaSpeechSegmenter.getWrappedSegmenterDescription(Class<? extends AnalysisComponent> compClass)
can be used
to run a dkpro segmenter (for creating token and sentence annotations). Following dkpro components only rely on these annotations and can be used directly (as token and sentence annotations get projected into the drama text.
Stage directions are not analysed at the moment.
Examples of how to use the code
Code for extracting networks of figures in dramatic texts
Some generic classes and functions that support input and output
Reads in dramatic texts from HTML files downloaded from gutenberg.spiegel.de
Currently, this component expects preprocessing. Scripts can be found in src/main/perl
.
Parsing textgrid TEI texts, as good as possible. Will be extended whenever issues come up.
Utility functions