Skip to content
This repository has been archived by the owner on Feb 22, 2024. It is now read-only.

fusepoolP3/p3-dictionary-matcher-transformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

83 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dictionary Matcher Transformer Build Status

The Dictionary Matcher Transformer is an information extraction tool that extracts words and phrases from text based on SKOS taxonomies. It supports multiple languages and performs very fast keyword matching even with huge taxonomies. The transformer is based on the modified version of a string matching algorithm called Aho-Corasick.

Try it out

To try it out, you can download the binaries of the latest release from the release page.

Then start the application with

  java -jar dictionary-matcher-transformer-v1.0.0-*-jar-with-dependencies.jar

Once it started use cURL to test it

  curl -X POST -d "Frauds and Swindlings cause significant concerns with regards to Ethics." "http://localhost:8301/?taxonomy=http://data.nytimes.com/descriptors.rdf"

Compiling and Running

Compile source code and run the application using Maven with

  mvn clean install exec:java

Or start the application from binary

  java -jar dictionary-matcher-transformer-v1.0.0-*-jar-with-dependencies.jar

This will start the Dictionary Matcher Transformer on port 8301 by default.

  java -Xmx{size} -jar {jar-name} [options]

  -P|--port int     The port on which the proxy shall listen
  -C|--enableCors   Enable a liberal CORS policy
  -H|--help         Show help on command line arguments

Usage

The supported input and output formats of the transformer can be retrieved by the following GET request

  curl -X GET "http://localhost:8301/"
  <http://localhost:8301/>
  <http://vocab.fusepool.info/transformer#supportedInputFormat>
          "text/turtle"^^<http://www.w3.org/2001/XMLSchema#string> , "text/plain"^^<http://www.w3.org/2001/XMLSchema#string> ;
  <http://vocab.fusepool.info/transformer#supportedOutputFormat>
          "text/turtle"^^<http://www.w3.org/2001/XMLSchema#string> .

The transformer accepts the input data enclosed in the request message’s body, and expects the URI of the taxonomy (and additional options) in the query string.

  curl -X POST --data-binary <data> "http://localhost:8301/?taxonomy=<taxonomy_URI>&stemming=<stemming_language>&casesensitive=true"

taxonomy - URI of the taxonomy (it must be a valid resource location)

stemming - optional - if present, enables stemming (supported languages: danish, dutch, english, finnish, french, german, hungarian, italian, norwegian, portuguese, romanian, russian, spanish, swedish, turkish)

casesensitive - optional - if present, enables case sensitivity

The following curl example shows an example invocation of the Dictionary Matcher Transformer of a local running instance:

  curl -X POST --data-binary "Frauds and Swindlings cause significant concerns with regards to Ethics." "http://localhost:8301/?taxonomy=http://data.nytimes.com/descriptors.rdf&stemming=english"

  <http://localhost:8301/bad1b7a2-431a-4861-acd5-f01515a6d897#annotation-body1>
        a       <http://vocab.fusepool.info/fam#LinkedEntity> ;
        <http://vocab.fusepool.info/fam#entity-label>
                "Frauds and Swindling"^^<http://www.w3.org/2001/XMLSchema#string> ;
        <http://vocab.fusepool.info/fam#entity-mention>
                "Frauds and Swindlings"^^<http://www.w3.org/2001/XMLSchema#string> ;
        <http://vocab.fusepool.info/fam#entity-reference>
                <http://data.nytimes.com/N38522309997148425060> ;
        <http://vocab.fusepool.info/fam#extracted-from>
                <http://localhost:8301/bad1b7a2-431a-4861-acd5-f01515a6d897> ;
        <http://vocab.fusepool.info/fam#selector>
                <http://localhost:8301/bad1b7a2-431a-4861-acd5-f01515a6d897#char=0,21> .
  
  <http://localhost:8301/bad1b7a2-431a-4861-acd5-f01515a6d897#annotation1>
        a       <http://www.w3.org/ns/oa#Annotation> ;
        <http://www.w3.org/ns/oa#annotatedAt>
                "2014-10-30T14:35:26+0100"^^<http://www.w3.org/2001/XMLSchema#string> ;
        <http://www.w3.org/ns/oa#annotatedBy>
                <p3-dictionary-matcher-transformer> ;
        <http://www.w3.org/ns/oa#hasBody>
                <http://localhost:8301/bad1b7a2-431a-4861-acd5-f01515a6d897#annotation-body1> ;
        <http://www.w3.org/ns/oa#hasTarget>
                <http://localhost:8301/bad1b7a2-431a-4861-acd5-f01515a6d897#sp-resource1> .			  
  		
  <http://localhost:8301/bad1b7a2-431a-4861-acd5-f01515a6d897#sp-resource1>
        a       <http://www.w3.org/ns/oa#SpecificResource> ;
        <http://www.w3.org/ns/oa#hasSelector>
                <http://localhost:8301/bad1b7a2-431a-4861-acd5-f01515a6d897#char=0,21> ;
        <http://www.w3.org/ns/oa#hasSource>
                <http://localhost:8301/bad1b7a2-431a-4861-acd5-f01515a6d897#annotation-body1> .
  		
  <http://localhost:8301/bad1b7a2-431a-4861-acd5-f01515a6d897#char=0,21>
        a       <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#String> , <http://vocab.fusepool.info/fam#NifSelector> ;
        <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#beginIndex>
                "0"^^<http://www.w3.org/2001/XMLSchema#int> ;
        <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#endIndex>
                "21"^^<http://www.w3.org/2001/XMLSchema#int> .	
  	
  <http://localhost:8301/bad1b7a2-431a-4861-acd5-f01515a6d897#annotation-body2>
        a       <http://vocab.fusepool.info/fam#LinkedEntity> ;
        <http://vocab.fusepool.info/fam#entity-label>
                "Ethics"^^<http://www.w3.org/2001/XMLSchema#string> ;
        <http://vocab.fusepool.info/fam#entity-mention>
                "Ethics"^^<http://www.w3.org/2001/XMLSchema#string> ;
        <http://vocab.fusepool.info/fam#entity-reference>
                <http://data.nytimes.com/48662871776634757120> ;
        <http://vocab.fusepool.info/fam#extracted-from>
                <http://localhost:8301/bad1b7a2-431a-4861-acd5-f01515a6d897> ;
        <http://vocab.fusepool.info/fam#selector>
                <http://localhost:8301/bad1b7a2-431a-4861-acd5-f01515a6d897#char=65,71> .
  			  
  <http://localhost:8301/bad1b7a2-431a-4861-acd5-f01515a6d897#annotation2>
        a       <http://www.w3.org/ns/oa#Annotation> ;
        <http://www.w3.org/ns/oa#annotatedAt>
                "2014-10-30T14:35:26+0100"^^<http://www.w3.org/2001/XMLSchema#string> ;
        <http://www.w3.org/ns/oa#annotatedBy>
                <p3-dictionary-matcher-transformer> ;
        <http://www.w3.org/ns/oa#hasBody>
                <http://localhost:8301/bad1b7a2-431a-4861-acd5-f01515a6d897#annotation-body2> ;
        <http://www.w3.org/ns/oa#hasTarget>
                <http://localhost:8301/bad1b7a2-431a-4861-acd5-f01515a6d897#sp-resource2> .
  
  <http://localhost:8301/bad1b7a2-431a-4861-acd5-f01515a6d897#sp-resource2>
        a       <http://www.w3.org/ns/oa#SpecificResource> ;
        <http://www.w3.org/ns/oa#hasSelector>
                <http://localhost:8301/bad1b7a2-431a-4861-acd5-f01515a6d897#char=65,71> ;
        <http://www.w3.org/ns/oa#hasSource>
                <http://localhost:8301/bad1b7a2-431a-4861-acd5-f01515a6d897#annotation-body2> .
  
  <http://localhost:8301/bad1b7a2-431a-4861-acd5-f01515a6d897#char=65,71>
        a       <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#String> , <http://vocab.fusepool.info/fam#NifSelector> ;
        <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#beginIndex>
                "65"^^<http://www.w3.org/2001/XMLSchema#int> ;
        <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#endIndex>
                "71"^^<http://www.w3.org/2001/XMLSchema#int> .
              <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#endIndex>
                      "146"^^<http://www.w3.org/2001/XMLSchema#int> .

References

This application implements the requirements in FP-39, FP-105 and FP-197.