Integration of the Geneea API with Keboola Connection.
This is a Docker container used for running general-purpose NLP analysis jobs in the KBC. Automatically built Docker images are available at Docker Hub Registry.
The supported NLP analysis types are: sentiment
, entities
, tags
, relations
.
To build this container manually one can use:
git clone https://github.com/Geneea/keboola-nlp-analysis.git
cd keboola-nlp-analysis
sudo docker build --no-cache -t geneea/keboola-nlp-analysis .
This container can be run from the Registry using:
sudo docker run \
--volume=/home/ec2-user/data:/data \
--rm \
geneea/keboola-nlp-analysis:latest
Note: --volume
needs to be adjusted accordingly.
Mapped to /data/config.json
{
"storage": {
"input": {
"tables": [
{
"destination": "source.csv"
}
]
}
},
"parameters": {
"user_key": "<ENTER API KEY HERE>",
"columns": {
"id": ["date", "subject"],
"title": ["subject"],
"text": ["body_1", "body_2"]
},
"analysis_types": ["sentiment", "entities", "tags", "relations"],
"language": "cs",
"domain": "news",
"correction": "basic",
"diacritization": "auto",
"use_beta": false
}
}
The results of the NLP analysis are written into four tables.
-
analysis-result-documents.csv
with document-level results in the following columns:- all
id
columns from the input table (used as primary keys) language
detected language of the document, as ISO 639-1 language codesentimentValue
detected sentiment of the document, from an interval [-1.0; 1.0]sentimentPolarity
detected sentiment of the document (-1, 0 or 1)sentimentLabel
sentiment of the document as a label (negative, neutral or positive)sentimentDetailedLabel
sentiment of the document as a detailed labelusedChars
the number of characters used by this document
- all
-
analysis-result-sentences.csv
with sentence-level results has the following columns:- all
id
columns from the input table (used as primary keys) index
zero-based index of the sentence in the document, (primary key)segment
text segment where the sentence is locatedtext
the sentence textsentimentValue
detected sentiment of the sentence, from an interval [-1.0; 1.0]sentimentPolarity
detected sentiment of the sentence (-1, 0 or 1)sentimentLabel
sentiment of the sentence as a label (negative, neutral or positive)sentimentDetailedLabel
sentiment of the sentence as a detailed label
There are multiple rows per one document. All
id
columns plus theindex
column are part of the primary key. - all
-
analysis-result-entities.csv
with entity-level results has the following columns:- all
id
columns from the input table (used as primary keys) type
type of the found entity, e.g. person, organization or tag, (primary key)text
disambiguated and standardized form of the entity, e.g. John Smith, Keboola, safe carseat, (primary key)score
relevance score of the entity, e.g. 0.8entityUid
unique ID of the entity, may be emptysentimentValue
detected sentiment of the entity, from an interval [-1.0; 1.0]sentimentPolarity
detected sentiment of the entity (-1, 0 or 1)sentimentLabel
sentiment of the entity as a label (negative, neutral or positive)sentimentDetailedLabel
sentiment of the entity as a detailed label
There are multiple rows per one document. All
id
columns plustype
andtext
columns are part of the primary key.Note that the table also contains topic tags, marked as tag in the
type
column. - all
-
analysis-result-relations.csv
with relation-level results has the following columns:- all
id
columns from the input table (used as primary keys) type
type of the found relation, VERB or ATTR, (primary key)name
textual name of the relation, e.g. buy or smart, (primary key)negated
negation flag of the relation, true or false, (primary key)subject
possible subject of the relation (primary key)object
possible object of the relation (primary key)subjectType
type of the relation's subjectobjectType
type of the relation's objectsubjectUid
unique ID of the relation's subjectobjectUid
unique ID of the relation's objectsentimentValue
detected sentiment of the relation, from an interval [-1.0; 1.0]sentimentPolarity
detected sentiment of the relation (-1, 0 or 1)sentimentLabel
sentiment of the relation as a label (negative, neutral or positive)sentimentDetailedLabel
sentiment of the relation as a detailed label
There are multiple rows per one document. All
id
columns plustype
,name
,negated
,subject
,object
columns are part of the primary key. - all