kinesis-ner

This is a exmaple to build a Named Entity Recognizer(NER) pipeline that:

fetch data from Amazon Kinesis stream
process text fields by Stanford CoreNLP and extract entities
store result to dynamoDB table

Getting started:

To build, run:

mvn package

Maven will generate a zip file kinesis-ner-{version}-SNAPSHOT-package.zip in the target/ folder. The zip file contains all dependencies except Stanford NLP model jar since it is too large.

(Note: To include the NLP models in package, you can remove the provided scope for coreNLP models in pom.xml before you run mvn package. Then you can skip step 3.)

Downlaod CoreNLP model.
Unzip the package in step 2.
Make sure your machine have permission to create/read/write Kinesis streams and DynanoDB tables.
Make sure all jars are in your classpath and run:

java -Xmx1536m -cp {class_path} com.chyikwei.app.KinesisNerApplication

(Note: the process will use ~1GB ram)

Put some data into the stream. the sample format is json with uuid, title, text fields. Example:

{
  "uuid": "04947df8-0e9e-4471-a2f9-9af509fb5801",
  "title": "news title",
  "text": "news text"
}

Check entities extracted from coreNLP. they will be stored in DynamoDB's ddb-news-entities table.
clean up AWS resources (kinesis stream, dynamoDB tables) after test. (The settings for stream & table names are here)

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.circleci		.circleci
src		src
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

kinesis-ner

Getting started:

About

Releases

Packages

Languages

chyikwei/kinesis-ner

Folders and files

Latest commit

History

Repository files navigation

kinesis-ner

Getting started:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages