Skip to content

Use spaCy for NLP and output to the FoLiA XML format.

Notifications You must be signed in to change notification settings

proycon/spacy2folia

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Spacy-to-FoliA

https://travis-ci.com/proycon/foliapy.svg?branch=master http://applejack.science.ru.nl/lamabadge.php/spacy2folia

Convert Spacy output to FoLiA XML Documents. Also supports FoLiA input.

Installation

$ pip install spacy2folia

You also need to install the spacy models you want like:

python -m spacy download en_core_web_sm

Usage Example

Using the command line tool on an input file named test.txt:

$ spacy2folia --model en_core_web_sm test.txt

This results in a document test.folia.xml in the current working directory.

You can also invoke the command line tool on one or more FoLiA documents as input:

$ spacy2folia --model en_core_web_sm document.folia.xml

The output file will be written to the currrent working directory (so it may overwirte the input if it's in the same directory!)

Usage from Python:

import spacy
from spacy2folia import spacy2folia

text = "Input text goes here"

nlp = spacy.load("en_core_web_sm")
doc = nlp(text)
foliadoc = spacy2folia.convert(doc, "example", paragraphs=True)
foliadoc.save("/tmp/output.folia.xml")

Usage from Python with FoLiA input:

import spacy
import folia.main as folia
from spacy2folia import spacy2folia

foliadoc = folia.Document(file="/tmp/input.folia.xml")
nlp = spacy.load("en_core_web_sm")
spacy2folia.convert_folia(foliadoc, nlp)
foliadoc.save("/tmp/output.folia.xml")

About

Use spaCy for NLP and output to the FoLiA XML format.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages