This file intends to document the integration of the software component “SNP Extraction Tool for Human Variations“ (SETH) into the OpenMinTeD platform.
The focus rests on software development and integration.
The following top level milestones do not strictly depend on each other:
- UIMA XMI data format serialization for SETH output
- REST endpoint
- dockerize
- acquire general knowledge about UIMA XMI:
- Analysis Engines (AEs) produce Analysis Results (ARs): intro
- Annotators (e.g. SETH) produce Annotations
- an AR is represented as CAS (Common Analysis Structure): intro, references
- a CAS contains the analyzed document, a type system and annotations
- identify relevant UIMA XMI concepts/components e.g. CAS types:
- annotation (describes a region of a document) -> MutationMention
- (entity -> Mutation)
- implement relevant CAS types (MVP)
- use JCas: reference
- UIMA annotator tutorial
- version 3 user guide
- create description xml
- convert description xml to java class, use JCasGen:
- requires UIMA SDK installed
- execute:
PATH/TO/UIMA-SDK/bin/jcasgen.sh PATH/TO/INPUT_DESCRIPTION.xml PATH/TO/OUTPUT/DIRECTORY
- example:
/opt/apache-uima/bin/jcasgen.sh /home/arne/devel/Java/SETH/src/main/desc/SethTypeSystem.xml /home/arne/devel/Java/SETH/src/main/java
- example:
- HANDLED BY MAVEN: jcasgen-maven-plugin
- create an Analysis Engine Descriptor file
- test
- using UIMA Document Analyzer
- see how to use UIMA shell scripts
- build SETH jar package with maven
- add jar (incl. dependencies) to UIMA classpath, e.g. with:
export UIMA_CLASSPATH="/home/arne/devel/Java/SETH/target/seth-1.3.1-Snapshot-jar-with-dependencies.jar"
- start UIMA analyzer:
PATH/TO/UIMA-SDK/bin/documentAnalyzer.sh
- using UIMA Document Analyzer
- move to full UIMA application
-
think about logging -
think about multi threading (see UIMA Multi-threaded Applications) - implement rest service (MVP)
- use spring-mvc: guide
- implement complete MutationAnnotation CAS type
- identify relevant features
- identify feature types
- define mappings to CAS primitive types and/or integrate required SETH types into SethTypeSystem.xml
- write unit test: produce UIMA json from input text (via spring)
- create a release (maven how-to)
- push to github with github-release-plugin
- NOTE: private key does not work! credentials has to be stored in maven settings.xml (calling the goal with system parameters during deploy does not work!)
- NOTE: to delete created (github) tags, execute in git root dir:
git tag -d 0.0.1 && git push --tags -f
- push image to Docker Hub
choose "good"has to be the username for docker hubdocker.image.prefix
, currently it is "dfki":- create docker account
- tag image as
latest
- use maven plugin:
usemvn dockerfile:push -Ddockerfile.username=... -Ddockerfile.password=...
mvn release
with maven settings.xml holding credentials (but-Ddockerfile.username=... -Ddockerfile.password=...
is still possible)
- license compliance
- create list of included/used packages
- write how-to-integrate NER service (?)