A python module for interfacing with Java sentence splitter Loomchild. This package is aimed to be used in Bifixer and/or Bitextor
System dependencies to build and use this package are Maven
and Java
.
This package can be installed with pip
from pypi:
pip install loomchild-segment
Splitting a text into sentences:
from loomchild.segmenter import LoomchildSegmenter
segmenter = LoomchildSegmenter(lang)
# segmenting a single line:
segments = segmenter.get_segmentation(input_line)
print("\n".join(segments))
# segmenting a document (i.e. multiple line breaks in the input)
segments = segmenter.get_document_segmentation(input_text)
print("\n".join(segments))
A command line tool is provided to work with base64 encoded documents.
cat b64encoded_input | py-segment -l $LANG > b64encoded_output