Skip to content

ctakes dependency parser

Sean Finan edited this page Sep 21, 2024 · 4 revisions

Dependency parsers provide syntactic information about sentences. Unlike deep parsers, they do not explicitly find phrases (e.g., NP or VP); rather, they find the dependencies between words. For example, "hormone replacement therapy" would have deep structure:
(NP (NML (NN hormone) (NN replacement)) (NN therapy))
but its dependency structure would show that "hormone" depends on "replacement" and "replacement" in turn depends on "therapyl". Below, the first column of numbers indicates the ID of the word, and the second number indicates what it is dependent on.
23 hormone hormone NN 24 NMOD 24 replacement replacement NN 25 NMOD 25 therapy therapy NN 22 PMOD
Dependency parsers can be labeled as well, e.g., we could specify that "hormone" is in a noun-modifier (i.e., NMOD) relationship with "therapy" in the example above (the last column).
This project provides an Apache UIMA wrapper and some utilities for ClearParser, a transition-based dependency parser that achieves state-of-the-art accuracy and speed.

ClearParser is described in:
"K-best, Locally Pruned, Transition-based Dependency Parsing Using Robust Risk Minimization." Jinho D. Choi, Nicolas Nicolov, Collections of Recent Advances in Natural Language Processing V, 205-216, John Benjamins, Amsterdam & Philadelphia, 2009.

The semantic role labeler assigns the predicate-argument structure of the sentence. (Who did what to whom when and where.)

Collection Readers
Annotation Engines
Output Writers
Utilities


Collection Readers

Dependency File Reader

Reads in dependency tree training/test data in a tab-delimited format.

Source class: DependencyFileCollectionReader
Source package: org.apache.ctakes.dependency.parser.cr
Parent class: org.apache.uima.collection.CollectionReader_ImplBase
Products: Base Token, Sentence, Dependency Node

No available configuration parameters.

XMI Reader (2)

Reads document texts and annotations from XMI files specified in a provided list.

Source class: XMIReader
Source package: org.apache.ctakes.dependency.parser.ae.util
Parent class: org.apache.uima.fit.component.JCasCollectionReader_ImplBase
Products: Document Id

Parameter Description Class Required Default
files The XMI files to be loaded List Yes

Annotation Engines

ClearNLP Semantic Role Labeler

Adds Semantic Roles Relations.

Source class: ClearNLPSemanticRoleLabelerAE
Source package: org.apache.ctakes.dependency.parser.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Sentence, Base Token, Dependency Node
Products: Semantic Relation

No available configuration parameters.

Thread safe ClearNLP Semantic Role Labeler

Adds Semantic Roles Relations.

Source class: ThreadSafeClearNlpSemRoleLabeler
Source package: org.apache.ctakes.dependency.parser.concurrent
Parent class: org.apache.ctakes.dependency.parser.ae.ClearNLPSemanticRoleLabelerAE
Dependencies: Sentence, Base Token, Dependency Node
Products: Semantic Relation

No available configuration parameters.


Output Writers

Dependency Node Writer

Writes information about Dependency Nodes to file.

Source class: DependencyNodeWriter
Source package: org.apache.ctakes.dependency.parser
Parent class: org.apache.uima.collection.CasConsumer_ImplBase
Dependencies: Sentence, Dependency Node

No available configuration parameters.


Utilities

ClearNLP Dependency Parser

Analyses Sentence Structure, storing information in nodes.

Source class: ClearNLPDependencyParserAE
Source package: org.apache.ctakes.dependency.parser.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Sentence, Base Token
Products: Dependency Node

Parameter Description Class Required Default
LemmatizerDataFile This parameter provides the data file required for the MorphEnAnalyzer. If not specified, this analysis engine will use a default model from the resources directory String Yes org/apache/ctakes/dependency/parser/models/lemmatizer/ dictionary-1.3.1.jar
ParserModelFileName This parameter provides the file name of the dependency parser model required by the factory method provided by ClearNLPUtil. If not specified, this analysis engine will use a default model from the resources directory String Yes org/apache/ctakes/dependency/parser/models/dependency/ mayo-en-dep-1.3.0.jar
UseLemmatizer If true, use the default ClearNLP lemmatizer, otherwise use lemmas from the BaseToken normalizedToken field boolean Yes true
MaxTokens The maximum length sentence to parse. Longer sentences will have a basic dependency structure created where every node's head is the sentence node. int No

ClearNLP Dependency Parser

Analyses Sentence Structure, storing information in nodes.

Source class: ThreadSafeClearNlpDepParser
Source package: org.apache.ctakes.dependency.parser.concurrent
Parent class: org.apache.ctakes.dependency.parser.ae.ClearNLPDependencyParserAE
Dependencies: Sentence, Base Token
Products: Dependency Node

Parameter Description Class Required Default
LemmatizerDataFile This parameter provides the data file required for the MorphEnAnalyzer. If not specified, this analysis engine will use a default model from the resources directory String Yes org/apache/ctakes/dependency/parser/models/lemmatizer/ dictionary-1.3.1.jar
ParserModelFileName This parameter provides the file name of the dependency parser model required by the factory method provided by ClearNLPUtil. If not specified, this analysis engine will use a default model from the resources directory String Yes org/apache/ctakes/dependency/parser/models/dependency/ mayo-en-dep-1.3.0.jar
UseLemmatizer If true, use the default ClearNLP lemmatizer, otherwise use lemmas from the BaseToken normalizedToken field boolean Yes true
MaxTokens The maximum length sentence to parse. Longer sentences will have a basic dependency structure created where every node's head is the sentence node. int No
Clone this wiki locally