Skip to content

Python based implementation of named entity recognition using Spacy.

Notifications You must be signed in to change notification settings

mpss2019fn1/spacy_ner

Repository files navigation

Spacy NER

This script converts a text consisting of plain words (separated by white space) in a text consisting of phrases. Phrases might be just ordinary words as in the beginning, but more importantly, they might be named entities as recognized by SpaCy. Those phrases may consist of multiple plain words, which are afterwards merged into one word using underscores.

Example: New York City -> New_York_City

Requirements

  • Python3
  • spacy (pip3 install spacy)
  • spacy vocabulary (python3 -m spacy download en_core_web_sm)

Usage

The tool can be used in two different ways:

  1. Convert one big input file:
python3 spacy_ner.py file --input={PATH_TO_CLEANED_TEXT} --output={PATH_TO_OUTPUT=stdout}
  1. Convert multiple input files:
python3 spacy_ner.py dir --source={PATH_TO_SOURCE_DIR} --target={PATH_TO_TARGET_DIR}

About

Python based implementation of named entity recognition using Spacy.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages