Skip to content

HPI-DeepLearning/sentence-boundary-detection-nn

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sentence Boundary Detecting using Deep Neural Networks

We try to detect sentence boundaries using deep learning. Created as part of the "Practical Applications of Multimedia Retrieval" seminar at the Hasso-Plattner-Institute, Potsdam, Germany.

Setup Demo

We build a python-based demo using caffe.

#####Prerequirements:

  1. Clone this repository
  2. Install python 2.7 including the following packages from requirements.txt

pip install -r requirements.txt

  1. Use the nltk downloader to download averaged_perceptron_tagger and punkt models:

python -m nltk.downloader

  1. Setup caffe, like described here
  2. Add path to the repository to your python path:

export PYTHONPATH=/path/to/sentence-boundary-detection-nn/python:$PYTHONPATH

  1. Download Google Word Vector (GoogleNews-vectors-negative300.bin.gz) from here or use directly this url and extract the result into the sentence-boundary-detection-nn/python/demo_data directory
  2. Paste your trained models into a demo data folder, for example sentence-boundary-detection-nn/python/demo_data with the following structure:
  • lexical_models : containing all pretrained models you want to use in a seperate directory. Each models needs a
    • .ini
    • .caffemodel
    • net.prototxt file.
  • text_data: containing all possible text files, which should be used as prediction input
  • audio_models: containing all pretrainied audio models, each in a seperate directory. Each needs the same files as described for lexical models
  • audio_examples: containing all audio files, which should be available during the demo. Each one in a seperate directory containing the ctm, energy and pitch files.

#####Start up

Change into the repository directory and execute, this should work right out of the box, unless you are using a custom demo_data folder:

python web_demo/web.py

Optionally you can specify the location of the word vector and the demo data. Otherwise default values are used. For further information execute:

python web_demo/web.py -h

About

Sentence Boundary Detection using Deep Neural Networks.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 61.9%
  • TeX 22.9%
  • Shell 5.4%
  • HTML 3.7%
  • JavaScript 3.2%
  • Gnuplot 1.8%
  • Other 1.1%