-
Notifications
You must be signed in to change notification settings - Fork 101
A Taxonomy of Processors
Keith Alcock edited this page Apr 5, 2021
·
7 revisions
The hierarchy of Processor classes is described here, including those from other related projects. This is organized as a class hierarchy, but it also describes which processors contain other ones and forward some calls to their delegates.
-
Processor
(trait, object org.clulab.processors) - Trait for all processors implementations. Key method here isannotate
, which contains the entire annotation functionality of a given processors class-
ShallowNLPProcessor
(class, objectorg.clulab.processors.shallownlp
) - performs only shallow analysis, which includes tokenization, POS tagging, and NER. Note that this class uses our own tokenizer, and POS tagger and NER from Stanford's CoreNLP.-
CoreNLPProcessor
(class, objectorg.clulab.processors.corenlp
) - this is a wrapper for the entire Stanford CoreNLP pipeline, which contains their constituent parser and coreference resolution (on top of whatShallowNLPProcessor
does). Use this class if you need the classic CoreNLP behavior. If you'd like to use their dependency parser useFastNLPProcessor
instead.-
BioNLPProcessor
(classorg.clulab.processors.bionlp
) - customizesCoreNLPProcessor
for biomedical texts. This includes a new tokenizer that is better suited for biomedical texts, as well as a biomedical NER. This class resides in thereach
project, notprocessors
.
-
-
FastNLPProcessor
(class, objectorg.clulab.processors.fastnlp
) - Almost the same asCoreNLPProcessor
, but uses Stanford's dependency parser instead of their constituent parser. Because of this, theannotate
method in this class tends to be faster than the one onCoreNLPProcessor
. Use this class if you need dependency trees rather than constituent trees.-
FastNLPProcessorWithSemanticRoles
(classorg.clulab.processors.fastnlp
) - adds semantic roles fromCluProcessor
on top of all the functionality inFastNLPProcessor
.-
EidosEnglishProcessor
(class org.clulab.wm.eidos) - adds the traits ofEidosProcessor
to the superclass, adapting it to work for theeidos
project where the class resides. -
EidosCluProcessor
(classorg.clulab.wm.eidos
) - also adds the traits ofEidosProcessor
to the superclass. That superclass is currently the same as the superclass ofEidosEnglishProcessor
, which makes this class redundant. However, as the name suggests, the superclass has in the past beenCluCoreProcessor
and this remains in case it needs to be changed back without affectingEidosEnglishProcessor
.
-
-
FastBioNLPProcessor
(classorg.clulab.processors.bionlp
) - customizesFastNLPProcessor
for biomedical texts. This includes a new tokenizer that is better suited for biomedical texts, as well as a biomedical NER. This class resides in thereach
project, notprocessors
.
-
-
-
CluProcessor
(class, objectorg.clulab.processors.clu
) - uses tools developed in in house, all released with Apache license. This includes tokenizer, POS tagger, NER, dependency parser, and semantic role labeling (SRL).-
CluCoreProcessor
(class, objectorg.clulab.processors.clucore
) - adds Stanford'sNumericEntityRecognizer
, which recognizes numeric entities such as dates, times, and money, to the functionality ofCluProcessor
. -
SpanishCluProcessor
(classorg.clulab.processors.clu
) -CluProcessor
for Spanish-
EidosSpanishProcessor
(classorg.clulab.wm.eidos
) - addsEidosProcessor
trait to superclass
-
-
PortugueseCluProcessor
(classorg.clulab.processors.clu
) -CluProcessor
for Portuguese-
EidosCluProcessor
(classorg.clulab.wm.eidos
) - addsEidosProcessor
trait to superclass
-
-
-
EidosProcessor
(traitorg.clulab.wm.eidoscommon
) - describes processors used in theeidos
project. These are processors which include traits ofSentencesExtractor
,LanguageSpecific
,Tokenizing
, andEidosTokenizing
. TheSentenceExtractor
includes two minimal methods for extracting documents and sentences that can be called externally without a the caller needing to know about the entireEidosSystem
class.LanguageSpecific
means that it applies to a particular language and has tag set to match.Tokenizing
means that it can supply access to its tokenizer andEidosTokenizer
means that it has anEidosTokenizer
. This special tokenizer provides important paragraph splitting functionality among other things.
-
- Users (r--)
- Developers (-w-)
- Maintainers (--x)