Skip to content
/ verby Public

Segment texts into sentences and verbal phrases

Notifications You must be signed in to change notification settings

uhh-lt/verby

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Segment text using SpaCy

This project segments text sent to it into both sentences and verbal phrases. For now only German is supported! We primarily aim to provide a simple way of splitting text into verbal phrases as proposed in Vauth et al (2021). In addition, we also provide a way of splitting the text into sentences.

If you are using this in your academic work please cite our paper:

@inproceedings{vauthAutomatedEventAnnotation2021,
  title = {Automated {{Event Annotation}} in {{Literary Texts}}},
  booktitle = {{{CHR}} 2021: {{Computational Humanities Research Conference}}},
  author = {Vauth, Michael and Hatzel, Hans Ole and Gius, Evelyn and Biemann, Chris},
  date = {2021-11-17/2021-11-19},
  series = {{{CEUR Workshop Proceedings}}},
  volume = {2989},
  pages = {333--345},
  location = {Amsterdam, The Netherlands},
  url = {http://ceur-ws.org/Vol-2989/short_paper18.pdf},
  eventtitle = {{{CHR}} 2021: {{Computational Humanities Research Conference}}}
}

Building the Docker Image

In the project's top-level directory run: docker build -t verby . This will build a docker image that can be run with: docker run -p 8000:80 verby where the -p option will ensure that you can access the api on port 8000 from your host.

HTTP API

After starting the server either via docker or in a development setup you should be able to post you segmentation requests.

Using the CLI tool httpie:

http POST 127.0.0.1:8000/segment text="Ich gehe auf einem Wagen, oder wie manche sagen einem Auto, spazieren. Du gehst nachhause."

Or from Python code:

import requests
response = requests.post("http://127.0.0.1:8000/segment", json={"text": "Ich gehe auf einem Wagen, oder wie manche sagen einem Auto, spazieren. Du gehst nachhause."})
print(response.json())
# Prints: {'verbal_phrases': [[[0, 30], [60, 69]], [[31, 47]], [[71, 90]]], 'sentences': [[0, 70], [71, 90]]}

You will get a response object with the character offsets of sentences and verbal phrases. Note that verbal phrases may be discontinuous, as in the case above with the insertion.

Development Server

To run a development server just execute fastapi dev web.py

Library Usage

If you would prefer using verby as a library rather than via HTTP, you can use this sample code as a starting point.

import verby

nlp = verby.pipeline.build_pipeline("de")

doc = nlp("Sie lassen alle die krank sind nachhause gehen.")
for phrase in doc._.verbal_phrases:
    for span in phrase:
        print(span.start_char, span.end_char)

About

Segment texts into sentences and verbal phrases

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published