-
Notifications
You must be signed in to change notification settings - Fork 0
ctakes core
Contains code and resources required by all or most other cTAKES modules.
Collection Readers
Annotation Engines
Output Writers
Utilities
Piper Files
Reads document texts from text files in a directory tree.
Source class: FileTreeReader
Source package: org.apache.ctakes.core.cr
Parent class: org.apache.ctakes.core.cr.AbstractFileTreeReader
Products: Document Id, Document Id Prefix
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
InputDirectory | Directory for all input files. | String | Yes | |
CRtoSpace | Change windows-format CR + LF character sequences to LF + . | boolean | No | |
Encoding | The character encoding used by the input files. | String | No | |
Extensions | The extensions of the files that the collection reader will read. | String[] | No | * |
KeepCR | Keep windows-format carriage return characters at line endings. This will only keep existing characters, it will not add them. | boolean | No | |
PatientLevel | The level in the directory hierarchy at which patient identifiers exist.Default value is 1; directly under root input directory. | int | No | |
StripQuotes | Replace document-enclosing quote characters with space characters. | boolean | No | |
WriteBanner | Write a large banner at each major step of the pipeline. | String | No | no |
Reads document texts from text files in a directory, repeating for a number of iterations.
Source class: FilesInDirectoryCollectionCyclicalReads
Source package: org.apache.ctakes.core.cr
Parent class: org.apache.ctakes.core.cr.FilesInDirectoryCollectionReader
Products: Document Id
No available configuration parameters.
Reads document texts from text files in a directory.
Source class: FilesInDirectoryCollectionReader
Source package: org.apache.ctakes.core.cr
Parent class: org.apache.uima.collection.CollectionReader_ImplBase
Products: Document Id
No available configuration parameters.
Reads document texts from database table's fields.
Source class: JdbcNotesReader
Source package: org.apache.ctakes.core.cr.jdbc
Parent class: org.apache.uima.fit.component.JCasCollectionReader_ImplBase
Products: Document Id
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
DbDriver | JDBC driver ClassName. | String | Yes | |
DbPass | Password for database authentication. | String | Yes | |
DbUrl | JDBC URL that specifies database network location and name. | String | Yes | |
DbUser | Username for database authentication. | String | Yes | |
DocColumn | Name of column that contains the document text. | String | Yes | |
SqlStatement | SQL statement to retrieve the document. | String | Yes | |
BirthColumn | Name of column that contains the patient birth date. | String | No | |
DateColumn | Name of column that contains the document original date. | String | No | |
DbDecryptor | JDBC decryptor ClassName. | String | No | |
DeathColumn | Name of column that contains the patient death date. | String | No | |
DecryptPass | Password for text decryption. | String | No | |
EncounterIdColumn | Name of column that contains the encounter id. | String | No | |
FirstNameColumn | Name of column that contains the patient first name. | String | No | |
FirstSoundexColumn | Name of column that contains the patient first name soundex. | String | No | |
GenderColumn | Name of column that contains the patient gender. | String | No | |
IdColumns | Specifies column names that will be used to form a document ID. | String[] | No | |
IdDelimiter | Specifies delimiter used when document ID is built. | String | No | |
InstanceIdColumn | Name of column that contains the document instance id. | String | No | |
InstituteColumn | Name of column that contains the source institution. | String | No | |
KeepAlive | Flag that determines whether to keep JDBC connection open no matter what. | String | No | |
LastNameColumn | Name of column that contains the patient last name. | String | No | |
LastSoundexColumn | Name of column that contains the patient last name soundex. | String | No | |
MiddleNameColumn | Name of column that contains the patient middle name. | String | No | |
NoteSubtypeColumn | Name of column that contains the note subtype. | String | No | |
NoteTypeColumn | Name of column that contains the note type. | String | No | |
PatientColumn | Name of column that contains the patient identifier. | String | No | |
PatientIdColumn | Name of column that contains the patient id. | String | No | |
RevisionColumn | Name of column that contains the document revision number. | String | No | |
RevisionDateColumn | Name of column that contains the document revision date. | String | No | |
SpecialtyColumn | Name of column that contains the author specialty. | String | No | |
StandardColumn | Name of column that contains the document standard. | String | No |
Reads document texts from database text fields.
Source class: JdbcCollectionReader
Source package: org.apache.ctakes.core.cr
Parent class: org.apache.uima.fit.component.JCasCollectionReader_ImplBase
Products: Document Id
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
DbConnResrcName | Name of external resource for database connection. | String | Yes | |
DocTextColName | Name of column from resultset that contains the document text. | String | Yes | |
SqlStatement | SQL statement to retrieve the document. | String | Yes | |
DocIdColNames | Specifies column names that will be used to form a document ID. | String[] | No | |
DocIdDelimiter | Specifies delimiter used when document ID is built. | String | No | |
ValueFileResrcName | Name of external resource for prepared statement value file. | String | No |
Reads a document texts from a single text file, treating each line as a document.
Source class: LinesFromFileCollectionReader
Source package: org.apache.ctakes.core.cr
Parent class: org.apache.uima.collection.CollectionReader_ImplBase
Products: Document Id
No available configuration parameters.
Reads document texts from Lucene text fields.
Source class: LuceneCollectionReader
Source package: org.apache.ctakes.core.cr
Parent class: org.apache.uima.fit.component.CasCollectionReader_ImplBase
Products: Document Id
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
IndexDirectory | Location of lucene index | String | Yes | |
FieldName | Field to look in for document text | String | No | |
MaxWords | Maximum number of words to process (approximate -- actually based on characters) | int | No |
Reads document texts from text files specified in a provided list.
Source class: TextReader
Source package: org.apache.ctakes.core.cr
Parent class: org.apache.uima.fit.component.JCasCollectionReader_ImplBase
Products: Document Id
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
files | The text files to be loaded | List | Yes |
Reads document texts and annotations from XMI files specified in a provided list.
Source class: XMIReader
Source package: org.apache.ctakes.core.cr
Parent class: org.apache.uima.fit.component.JCasCollectionReader_ImplBase
Products: Document Id
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
files | The XMI files to be loaded | List | Yes |
Reads document texts and annotations from XMI files in a directory tree.
Source class: XmiTreeReader
Source package: org.apache.ctakes.core.cr
Parent class: org.apache.ctakes.core.cr.AbstractFileTreeReader
Products: Document Id
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
InputDirectory | Directory for all input files. | String | Yes | |
CRtoSpace | Change windows-format CR + LF character sequences to LF + . | boolean | No | |
Encoding | The character encoding used by the input files. | String | No | |
Extensions | The extensions of the files that the collection reader will read. | String[] | No | * |
KeepCR | Keep windows-format carriage return characters at line endings. This will only keep existing characters, it will not add them. | boolean | No | |
PatientLevel | The level in the directory hierarchy at which patient identifiers exist.Default value is 1; directly under root input directory. | int | No | |
StripQuotes | Replace document-enclosing quote characters with space characters. | boolean | No | |
WriteBanner | Write a large banner at each major step of the pipeline. | String | No | no |
Reads document texts and annotations from XMI files in a directory.
Source class: XmiCollectionReaderCtakes
Source package: org.apache.ctakes.core.cr
Parent class: org.apache.uima.collection.CollectionReader_ImplBase
Products: Document Id
No available configuration parameters.
Annotates Document Sections by detecting Section Headers using Regular Expressions provided in a File.
Source class: CDASegmentAnnotator
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Document Id
Products: Section
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
sections_file | Path to File that contains the section header mappings | String | No | src/user/resources/org/apache/ctakes/core/sections/ccda_sections.txt |
Re-annotates Sentences based upon short lines, preventing a Sentence from spanning over an intentional line break.
Source class: EolSentenceFixer
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Sentence
No available configuration parameters.
Associates Lab Mentions with values.
Source class: LabValueFinder
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Section, Base Token, Identified Annotation
Products: Generic Relation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
labTUIs | TUIs indicating lab measurements | String[] | Yes | |
allSections | Use all Annotatable sections. This ignores the value of sections | String | No | true |
excludeCUIs | CUIs not indicating specific lab measurements | String[] | No | |
maxLineCount | Maximum newlines between lab and value | int | No | |
sections | Annotatable sections | String[] | No | |
useDrugs | Use Medications in addition to Labs. | String | No | false |
valueWords | Words indicating values | String[] | No |
Annotates formatted List Sections by detecting them using Regular Expressions provided in an input File.
Source class: ListAnnotator
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Section
Products: List
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
LIST_TYPES_PATH | path to a file containing a list of regular expressions and corresponding list types. | String | Yes | org/apache/ctakes/core/list/ DefaultListRegex.bsv |
Checks List Entries for negation, which may be exhibited differently from unstructured negation.
Source class: ListEntryNegator
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: List, Identified Annotation
No available configuration parameters.
Re-annotates Paragraphs based upon existing Lists, preventing a Paragraph from spanning more than one List.
Source class: ListParagraphFixer
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: List, Sentence
No available configuration parameters.
Re-annotates Sentences based upon existing List Entries, preventing a Sentence from spanning more than one List Entry.
Source class: ListSentenceFixer
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: List, Sentence
No available configuration parameters.
Annotates Document Penn TreeBank Tokens.
Source class: TokenizerAnnotatorPTB
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Section, Sentence
Products: Base Token
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
SegmentsToSkip | Set of segments that can be skipped | String[] | No |
Annotates Paragraphs by detecting them using Regular Expressions provided in an input File or by empty text lines.
Source class: ParagraphAnnotator
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Section
Products: Paragraph
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
PARAGRAPH_TYPES_PATH | path to a file containing a list of regular expressions and corresponding paragraph types. | String | No |
Re-annotates Sentences based upon existing Paragraphs, preventing a Sentence from spanning more than one Paragraph.
Source class: ParagraphSentenceFixer
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Paragraph, Sentence
No available configuration parameters.
Sentence detector that uses B I O for determination. Useful for documents in which newlines may not indicate sentence boundaries.
Source class: SentenceDetectorAnnotatorBIO
Source package: org.apache.ctakes.core.ae
Parent class: org.cleartk.ml.CleartkAnnotator
Dependencies: Section
Products: Sentence
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
classifierFactoryClassName | provides the full name of the ClassifierFactory class to be used. | String | No | org.cleartk.ml.jar. JarClassifierFactory |
dataWriterFactoryClassName | provides the full name of the DataWriterFactory class to be used. | String | No | org.cleartk.ml.jar. DefaultDataWriterFactory |
FeatureConfiguration | FEAT_CONFIG | No | ||
isTraining | determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. | Boolean | No | |
TokenFilename | String | No |
Annotates Document Sections by detecting Section Headers using Regular Expressions provided in a Bar-Separated-Value (BSV) File.
Source class: BsvRegexSectionizer
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.ctakes.core.ae.RegexSectionizer
Products: Section
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
SectionsBsv | path to a BSV file containing a list of regular expressions and corresponding section types. | String | Yes | org/apache/ctakes/core/sections/ DefaultSectionRegex.bsv |
TagDividers | True if lines of divider characters ____ , ---- , === should divide sections | boolean | No | true |
Annotates Document Sections by detecting Section Headers in template.
Source class: SectionSegmentAnnotator
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.analysis_component.JCasAnnotator_ImplBase
Products: Section
No available configuration parameters.
Annotates Sentences based upon an OpenNLP model.
Source class: SentenceDetector
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Section
Products: Sentence
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
SentenceModelFile | Path to sentence detector model file | String | Yes | org/apache/ctakes/core/models/sentdetect/ sd-med-model.zip |
SegmentsToSkip | Set of segments that can be skipped | String[] | No |
Annotates Document as a single Section.
Source class: SimpleSegmentAnnotator
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Products: Section
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
SegmentID | Name to give to all segments | String | No | SIMPLE_SEGMENT |
Annotates Document Sections by detecting start and end Section Tags.
Source class: SimpleSegmentWithTagsAnnotator
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.analysis_component.JCasAnnotator_ImplBase
Products: Section
No available configuration parameters.
Annotates Sentences based upon an OpenNLP model.
Source class: ThreadSafeSentenceDetector
Source package: org.apache.ctakes.core.concurrent
Parent class: org.apache.ctakes.core.ae.SentenceDetector
Dependencies: Section
Products: Sentence
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
SentenceModelFile | Path to sentence detector model file | String | Yes | org/apache/ctakes/core/models/sentdetect/ sd-med-model.zip |
SegmentsToSkip | Set of segments that can be skipped | String[] | No |
Thread safe sentence detector that uses B I O for determination. Useful for documents in which newlines may not indicate sentence boundaries.
Source class: ThreadSafeSentenceDetectorBio
Source package: org.apache.ctakes.core.concurrent
Parent class: org.apache.ctakes.core.ae.SentenceDetectorAnnotatorBIO
Dependencies: Section
Products: Sentence
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
classifierFactoryClassName | provides the full name of the ClassifierFactory class to be used. | String | No | org.cleartk.ml.jar. JarClassifierFactory |
dataWriterFactoryClassName | provides the full name of the DataWriterFactory class to be used. | String | No | org.cleartk.ml.jar. DefaultDataWriterFactory |
FeatureConfiguration | FEAT_CONFIG | No | ||
isTraining | determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. | Boolean | No | |
TokenFilename | String | No |
Annotates Document Tokens.
Source class: TokenizerAnnotator
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.analysis_component.JCasAnnotator_ImplBase
Dependencies: Section
Products: Base Token
No available configuration parameters.
Writes a two-column BSV file containing CUIs and their total counts in a document.
Source class: CuiCountFileWriter
Source package: org.apache.ctakes.core.cc
Parent class: org.apache.uima.fit.component.CasConsumer_ImplBase
Dependencies: Document Id, Identified Annotation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
OutputDirectory | Directory for all output files. | String | No |
Writes a list of CUIs, covered text and preferred text to files.
Source class: CuiListFileWriter
Source package: org.apache.ctakes.core.cc
Parent class: org.apache.ctakes.core.cc.AbstractJCasFileWriter
Dependencies: Document Id, Sentence, Base Token
Usables: Document Id Prefix, Identified Annotation, Event, Timex, Temporal Relation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
OutputDirectory | Directory for all output files. | File | Yes | |
SubDirectory | SubDirectory for files. | String | No |
Writes Text files with original text from the document.
Source class: FilesInDirectoryCasConsumer
Source package: org.apache.ctakes.core.cc
Parent class: org.apache.uima.collection.CasConsumer_ImplBase
Dependencies: Document Id
No available configuration parameters.
Writes Text files with original text from the document in a specified directory.
Source class: NormalizedFilesInDirectoryCasConsumer
Source package: org.apache.ctakes.core.cc
Parent class: org.apache.uima.collection.CasConsumer_ImplBase
Dependencies: Document Id, Base Token
No available configuration parameters.
Writes HTML files with a Table representation of extracted information.
Source class: HtmlTableCasConsumer
Source package: org.apache.ctakes.core.cc
Parent class: org.apache.uima.collection.CasConsumer_ImplBase
Dependencies: Base Token
No available configuration parameters.
Writes html files with document text and simple markups (Semantic Group, CUI, Negation).
Source class: HtmlTextWriter
Source package: org.apache.ctakes.core.cc.html
Parent class: org.apache.ctakes.core.cc.AbstractJCasFileWriter
Dependencies: Document Id, Sentence, Base Token
Usables: Document Id Prefix, Identified Annotation, Event, Timex, Temporal Relation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
OutputDirectory | Directory for all output files. | File | Yes | |
SubDirectory | SubDirectory for files. | String | No |
Writes html files with document text and simple markups (Semantic Group, CUI, Negation).
Source class: HtmlTextWriter
Source package: org.apache.ctakes.core.cc.pretty.html
Parent class: org.apache.ctakes.core.cc.AbstractJCasFileWriter
Dependencies: Document Id, Sentence, Base Token
Usables: Document Id Prefix, Identified Annotation, Event, Timex, Temporal Relation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
OutputDirectory | Directory for all output files. | File | Yes | |
SubDirectory | SubDirectory for files. | String | No |
Writes UMLS Concepts to a standard I2B2 Observation_Fact table.
Source class: I2b2JdbcWriter
Source package: org.apache.ctakes.core.cc.jdbc.i2b2
Parent class: org.apache.ctakes.core.cc.jdbc.AbstractJCasJdbcWriter
Dependencies: Identified Annotation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
DbDriver | JDBC driver ClassName. | String | Yes | |
DbPass | Password for database authentication. | String | Yes | |
DbUrl | JDBC URL that specifies database network location and name. | String | Yes | |
DbUser | Username for database authentication. | String | Yes | |
FactOutputTable | Name of the Observation_Fact table for writing output. | String | Yes | |
BatchSize | Number of statements to use in a batch. 0 or 1 denotes that batches should not be used. | String | No | |
KeepAlive | Flag that determines whether to keep JDBC connection open no matter what. | String | No | |
RepeatCuis | Repeat Concepts with the same Cui but possibly different Semantic Type or Preferred Text. | boolean | No |
Stores extracted information and document metadata in a database.
Source class: JdbcWriterTemplate
Source package: org.apache.ctakes.core.cc
Parent class: org.apache.ctakes.core.cc.AbstractJdbcWriter
Dependencies: Document Id, Identified Annotation
No available configuration parameters.
Writes a table of Medication information to file, sorted by character index.
Source class: MedicationTableFileWriter
Source package: org.apache.ctakes.core.cc
Parent class: org.apache.ctakes.core.cc.AbstractTableFileWriter
Dependencies: Document Id, Identified Annotation
Usables: Document Id Prefix
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
OutputDirectory | Directory for all output files. | File | Yes | |
SubDirectory | SubDirectory for files. | String | No | |
TableType | Type of Table to write to File. Possible values are: BSV, CSV, HTML, TAB | String | No |
Writes text files with document text and simple markups (POS, Semantic Group, CUI, Negation).
Source class: PrettyTextWriterFit
Source package: org.apache.ctakes.core.cc.pretty.plaintext
Parent class: org.apache.ctakes.core.cc.AbstractJCasFileWriter
Dependencies: Document Id, Sentence, Base Token
Usables: Document Id Prefix, Identified Annotation, Event, Timex, Temporal Relation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
OutputDirectory | Directory for all output files. | File | Yes | |
SubDirectory | SubDirectory for files. | String | No |
Writes text files with document text and simple markups (POS, Semantic Group, CUI, Negation).
Source class: PrettyTextWriterUima
Source package: org.apache.ctakes.core.cc.pretty.plaintext
Parent class: org.apache.uima.collection.CasConsumer_ImplBase
Dependencies: Document Id, Sentence, Base Token
Usables: Identified Annotation, Event, Timex, Temporal Relation
No available configuration parameters.
Writes text files with lists of annotations and properties (POS, Semantic Group, CUI, Negation).
Source class: PropertyTextWriterFit
Source package: org.apache.ctakes.core.cc.property.plaintext
Parent class: org.apache.uima.fit.component.CasConsumer_ImplBase
Dependencies: Document Id, Sentence, Identified Annotation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
OutputDirectory | Directory for all output files. | String | No |
Writes text files with lists of annotations and properties (POS, Semantic Group, CUI, Negation).
Source class: PropertyTextWriterUima
Source package: org.apache.ctakes.core.cc.property.plaintext
Parent class: org.apache.uima.collection.CasConsumer_ImplBase
Dependencies: Document Id, Sentence, Identified Annotation
No available configuration parameters.
Writes a table of Annotation information to file, grouped by Semantic Type.
Source class: SemanticTableFileWriter
Source package: org.apache.ctakes.core.cc
Parent class: org.apache.ctakes.core.cc.AbstractTableFileWriter
Dependencies: Document Id, Identified Annotation
Usables: Document Id Prefix
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
OutputDirectory | Directory for all output files. | File | Yes | |
SubDirectory | SubDirectory for files. | String | No | |
TableType | Type of Table to write to File. Possible values are: BSV, CSV, HTML, TAB | String | No |
Writes Text files with original text from the document, sentence by sentence.
Source class: SentenceTokensPrinter
Source package: org.apache.ctakes.core.cc
Parent class: org.apache.uima.collection.CasConsumer_ImplBase
Dependencies: Document Id, Sentence, Base Token
No available configuration parameters.
Writes BSV files with original text for extracted annotations and their span offsets.
Source class: TextSpanWriter
Source package: org.apache.ctakes.core.cc
Parent class: org.apache.uima.fit.component.CasConsumer_ImplBase
Dependencies: Identified Annotation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
OutputDirectory | Directory for all output files. | String | No |
Writes a two-column BSV file containing Begin and End offsets of tokens in a document.
Source class: TokenOffsetsCasConsumer
Source package: org.apache.ctakes.core.cc
Parent class: org.apache.uima.collection.CasConsumer_ImplBase
Dependencies: Document Id, Base Token
No available configuration parameters.
Writes a table of base tokens and their spans in a directory tree.
Source class: TokenTableFileWriter
Source package: org.apache.ctakes.core.cc
Parent class: org.apache.ctakes.core.cc.AbstractTableFileWriter
Usables: Document Id Prefix, Base Token
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
OutputDirectory | Directory for all output files. | File | Yes | |
SubDirectory | SubDirectory for files. | String | No | |
TableType | Type of Table to write to File. Possible values are: BSV, CSV, HTML, TAB | String | No |
Writes a two-column BSV file containing Words and their total counts in a document.
Source class: TokenFreqCasConsumer
Source package: org.apache.ctakes.core.cc
Parent class: org.apache.uima.collection.CasConsumer_ImplBase
Dependencies: Base Token
No available configuration parameters.
Writes XMI files with full representation of input text and all extracted information.
Source class: XmiWriterCasConsumerCtakes
Source package: org.apache.ctakes.core.cc
Parent class: org.apache.uima.fit.component.CasConsumer_ImplBase
Dependencies: Document Id
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
OutputDirectory | Output directory to write xmi files | File | Yes |
Writes XMI files with full representation of input text and all extracted information.
Source class: FileTreeXmiWriter
Source package: org.apache.ctakes.core.cc
Parent class: org.apache.ctakes.core.cc.AbstractJCasFileWriter
Dependencies: Document Id
Usables: Document Id Prefix
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
OutputDirectory | Directory for all output files. | File | Yes | |
SubDirectory | SubDirectory for files. | String | No |
Writes XMI files with full representation of input text and all extracted information.
Source class: CasConsumer
Source package: org.apache.ctakes.core.cc
Parent class: org.apache.uima.collection.CasConsumer_ImplBase
Dependencies: Document Id
No available configuration parameters.
Removes annotations of a given type from the JCas.
Source class: FilterAnnotator
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.analysis_component.JCasAnnotator_ImplBase
Dependencies: Base Token
No available configuration parameters.
Runs an external process.
Source class: CommandRunner
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.ctakes.core.ae.AbstractCommandRunner
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
OutputDirectory | Directory for all output files. | File | Yes | |
Command | A full command line to be executed. Make sure to quote. | String | No | |
CommandDir | The Command Executable's directory. | String | No | |
Log | A name for the streaming logger. Default is the Command. | String | No | |
LogFile | File to which cTAKES output should be sent. | String | No | |
Pause | Pause for some seconds. Default is 0 | int | No | |
PerDoc | yes to run the command once per document. Default is no. | String | No | no |
SetJavaHome | Set JAVA_HOME to the Java running cTAKES. Default is yes. | String | No | yes |
Wait | Wait for the process to finish. Default is no. | String | No | no |
WorkingDir | The Working Directory directory. | String | No |
Starts a new instance of cTAKES with the given piper parameters.
Source class: CtakesRunner
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.ctakes.core.ae.PausableFileLoggerAE
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
OutputDirectory | Directory for all output files. | File | Yes | |
Pipeline | Piper parameters. Make sure to quote. | String | Yes | |
LogFile | File to which cTAKES output should be sent. | String | No | |
Pause | Pause for some seconds. Default is 0 | int | No | |
Wait | Wait for the process to finish. Default is no. | String | No | no |
use FinishedLogger in (sub) package log.
Source class: FinishedLogger
Source package: org.apache.ctakes.core.util
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
No available configuration parameters.
Logs the Document ID to Log4j and Standard Output.
Source class: DocumentIdPrinterAnalysisEngine
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Document Id
No available configuration parameters.
Forcibly Exits cTAKES. Use only at the end of a pipeline.
Source class: ExitForcer
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.ctakes.core.ae.inert.PausableAE
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
ForceExit | Forcibly exits the system when the value is yes. Yes by default. | String | No | yes |
Pause | Pause for some seconds. Default is 0 | int | No | |
Wait | Wait for the process to finish. Default is no. | String | No | no |
Writes a banner message COMPLETE to the log when all processing is finished.
Source class: FinishedLogger
Source package: org.apache.ctakes.core.util.log
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
No available configuration parameters.
Copies document text and all annotations into a new JCas.
Source class: CopyAnnotator
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
dataBindMap | Mapping between source methods and destination methods in a bar (" | ") separated format | String[] | Yes |
destObjClass | Name of destination class | String | Yes | |
srcObjClass | Name of source class | String | Yes |
Reads annotations from SHARP schema Knowtator XML files in a directory.
Source class: SHARPKnowtatorXMLReader
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Products: Identified Annotation, Event, Timex, Location Relation, Degree Relation, Temporal Relation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
SetDefaults | whether or not to set default attribute values if no annotation is present | boolean | Yes | |
TextDirectory | directory containing the text files (if DocumentIDs are just filenames); defaults to assuming that DocumentIDs are full file paths | File | No |
Joins Sentences with person titles Mr. Mrs. Dr. that have been split by SentenceDetectorBIO.
Source class: MrsDrSentenceJoiner
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Sentence
No available configuration parameters.
Does absolutely nothing.
Source class: NullAnnotator
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.analysis_component.JCasAnnotator_ImplBase
No available configuration parameters.
Removes or modifies annotations that overlap.
Source class: OverlapAnnotator
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.analysis_component.JCasAnnotator_ImplBase
Dependencies: Base Token
No available configuration parameters.
Caches each Document JCas in a Patient JCas as a View.
Source class: PatientNoteCollector
Source package: org.apache.ctakes.core.patient
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
No available configuration parameters.
Analysis Engine that executes the PiperFileRunner. Kludge for desc files (CPE).
Source class: PiperFileRunEngine
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
PiperParams | Command Line Parameters normally used to run a piper file. | String | Yes |
Will pip a specified python package.
Source class: PythonPipper
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.ctakes.core.ae.PythonRunner
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
OutputDirectory | Directory for all output files. | File | Yes | |
PipPackage | Path of the python package to pip. | String | Yes | |
Command | A full command line to be executed. Make sure to quote. | String | No | |
CommandDir | The Command Executable's directory. | String | No | |
Log | A name for the streaming logger. Default is the Command. | String | No | |
LogFile | File to which cTAKES output should be sent. | String | No | |
Pause | Pause for some seconds. Default is 0 | int | No | |
PerDoc | yes to run the command once per document. Default is no. | String | No | no |
VirtualEnv | Path to Python virtual environment. | String | No | |
Wait | Wait for the process to finish. Default is no. | String | No | no |
WorkingDir | The Working Directory directory. | String | No |
Starts a Python process with the given parameters.
Source class: PythonRunner
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.ctakes.core.ae.AbstractCommandRunner
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
OutputDirectory | Directory for all output files. | File | Yes | |
Command | A full command line to be executed. Make sure to quote. | String | No | |
CommandDir | The Command Executable's directory. | String | No | |
Log | A name for the streaming logger. Default is the Command. | String | No | |
LogFile | File to which cTAKES output should be sent. | String | No | |
Pause | Pause for some seconds. Default is 0 | int | No | |
PerDoc | yes to run the command once per document. Default is no. | String | No | no |
VirtualEnv | Path to Python virtual environment. | String | No | |
Wait | Wait for the process to finish. Default is no. | String | No | no |
WorkingDir | The Working Directory directory. | String | No |
Simple Annotator to place before and after other annotators that do not Log their Start and Finish.
Source class: StartFinishLogger
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
LOGGER_NAME | provides the full name of the Annotator Engine for which start / end logging should be done. | String | Yes | StartEndProgressLogger |
IS_START | indicates whether this should log a start. | Boolean | No | |
LOGGER_TASK | provides the descriptive purpose of the Annotator Engine for which start / end logging should be done. | String | No | Processing ... |
Commands and parameters for a small tokenization pipeline.
$\textcolor{gray}{\textsf{// Commands and parameters for a small tokenization pipeline. }}$
$\textcolor{green}{\textbf{add}}$ SimpleSegmentAnnotator
$\textcolor{green}{\textbf{add}}$ SentenceDetector
$\textcolor{green}{\textbf{add}}$ TokenizerAnnotatorPTB
Commands and parameters for a small tokenization pipeline with sections, paragraphs and lists.
$\textcolor{gray}{\textsf{// Commands and parameters for a small tokenization pipeline with sections, paragraphs and lists. }}$
$\textcolor{gray}{\textsf{// Annotate sections by known regex }}$
$\textcolor{green}{\textbf{add}}$ BsvRegexSectionizer
$\textcolor{gray}{\textsf{// The sentence detector needs our custom model path, otherwise default values are used. }}$
$\textcolor{gray}{\textsf{//add SentenceDetectorAnnotatorBIO classifierJarPath=/org/apache/ctakes/core/models/sentdetect/model.jar }}$
$\textcolor{gray}{\textsf{// The SentenceDetectorAnnotatorBIO is a "lumper" that works well for notes in which end of line does not indicate a sentence. }}$
$\textcolor{gray}{\textsf{// If that is not your case, then you may get better results using the more standard SentenceDetector }}$
$\textcolor{green}{\textbf{add}}$ SentenceDetector
$\textcolor{gray}{\textsf{// By default, paragraphs are parsed using empty lines as separators and Part \#: }}$
$\textcolor{green}{\textbf{add}}$ ParagraphAnnotator
$\textcolor{gray}{\textsf{// Fix sentences so that no sentence spans across two or more paragraphs. }}$
$\textcolor{green}{\textbf{add}}$ ParagraphSentenceFixer
$\textcolor{gray}{\textsf{// Use regular expressions created for the Pitt notes to discover formatted lists and tables. }}$
$\textcolor{green}{\textbf{add}}$ ListAnnotator
$\textcolor{gray}{\textsf{// Fix sentences so that no sentence spans across two or more list entries. }}$
$\textcolor{green}{\textbf{add}}$ ListSentenceFixer
$\textcolor{gray}{\textsf{// Now we can finally tokenize, tag parts of speech and chunk using adjusted sentences. }}$
$\textcolor{green}{\textbf{add}}$ TokenizerAnnotatorPTB
Commands and parameters for a small thread-safe tokenization pipeline.
$\textcolor{gray}{\textsf{// Commands and parameters for a small thread-safe tokenization pipeline. }}$
$\textcolor{green}{\textbf{add}}$ SimpleSegmentAnnotator
$\textcolor{green}{\textbf{add}}$ $\textcolor{blue}{\textsf{concurrent.ThreadSafeSentenceDetector}}$
$\textcolor{green}{\textbf{add}}$ TokenizerAnnotatorPTB
Commands and parameters for a small thread-safe tokenization pipeline with sections, paragraphs and lists.
$\textcolor{gray}{\textsf{// Commands and parameters for a small thread-safe tokenization pipeline with sections, paragraphs and lists. }}$
$\textcolor{gray}{\textsf{// Annotate sections by known regex }}$
$\textcolor{green}{\textbf{add}}$ BsvRegexSectionizer
$\textcolor{gray}{\textsf{// The sentence detector needs our custom model path, otherwise default values are used. }}$
$\textcolor{gray}{\textsf{//add concurrent.ThreadSafeSentenceDetectorBio classifierJarPath=/org/apache/ctakes/core/models/sentdetect/model.jar }}$
$\textcolor{gray}{\textsf{// The SentenceDetectorAnnotatorBIO is a "lumper" that works well for notes in which end of line does not indicate a sentence. }}$
$\textcolor{gray}{\textsf{// If that is not your case, then you may get better results using the more standard SentenceDetector }}$
$\textcolor{green}{\textbf{add}}$ $\textcolor{blue}{\textsf{concurrent.ThreadSafeSentenceDetector}}$
$\textcolor{gray}{\textsf{// By default, paragraphs are parsed using empty lines as separators and Part \#: }}$
$\textcolor{green}{\textbf{add}}$ ParagraphAnnotator
$\textcolor{gray}{\textsf{// Fix sentences so that no sentence spans across two or more paragraphs. }}$
$\textcolor{green}{\textbf{add}}$ ParagraphSentenceFixer
$\textcolor{gray}{\textsf{// Use regular expressions created for the Pitt notes to discover formatted lists and tables. }}$
$\textcolor{green}{\textbf{add}}$ ListAnnotator
$\textcolor{gray}{\textsf{// Fix sentences so that no sentence spans across two or more list entries. }}$
$\textcolor{green}{\textbf{add}}$ ListSentenceFixer
$\textcolor{gray}{\textsf{// Now we can finally tokenize, tag parts of speech and chunk using adjusted sentences. }}$
$\textcolor{green}{\textbf{add}}$ TokenizerAnnotatorPTB
- ctakes-assertion
- ctakes-assertion-zoner
- ctakes-chunker
- ctakes-clinical-pipeline
- ctakes-constituency-parser
- ctakes-context-tokenizer
- ctakes-core
- ctakes-coreference
- ctakes-dependency-parser
- ctakes-dictionary-lookup
- ctakes-dictionary-lookup-fast
- ctakes-distribution
- ctakes-dockhand
- ctakes-drug-ner
- ctakes-examples
- ctakes-fhir
- ctakes-gui
- ctakes-lvg
- ctakes-mastif-zoner
- ctakes-ne-contexts
- ctakes-pbj
- ctakes-pos-tagger
- ctakes-preprocessor
- ctakes-regression-test
- ctakes-relation-extractor
- ctakes-side-effect
- ctakes-smoking-status
- ctakes-template-filler
- ctakes-temporal
- ctakes-tiny-rest
- ctakes-type-system
- ctakes-utils
- ctakes-web-rest
- ctakes-ytex
- ctakes-ytex-uima
- ctakes-ytex-web