spacy-to-naf
is a spaCy wrapper that converts text or
NAF input to NAF.
The converter minimally extracts a tokenized text
layer, and can additionally extract terms
, deps
, entities
and
chunks
layers.
Install spaCy
and spacy-to-naf
:
pip install spacy
pip install spacy-to-naf
Download a spaCy model, eg. 'en_core_web_sm':
python -m spacy download en_core_web_sm
Specify the spaCy model and the NAF layers to create (the text
layer is always created).
from spacy_to_naf.converter import Converter
converter = Converter('en_core_web_sm', add_terms=True, add_deps=True, add_entities=True, add_chunks=True)
The input may be a naf or text directory or a text string.
To convert text to a file 'example.naf' in the current directory:
text = "The cat sat on the mat. Felix was his name."
naf = converter.run(text, 'example.naf', '.')
The converter additionally returns a NafParser object for further processing.
To process text files from a 'text_in' to 'naf_out' directory:
converter.convert_text_files('text_in', 'naf_out')
Note that input text files are expected to end in '.txt'.
To process NAF files from 'naf_in' to 'naf_out':
converter.convert_naf_files('naf_in', 'naf_out')
Output files carry the same name as the input file, extension excepted.