Skip to content

A Python framework for developing and experimenting with NLP pipelines

License

Notifications You must be signed in to change notification settings

duncanka/NLPypline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 

Repository files navigation

The NLPypline framework

NLPypline is a Python framework for rapidly developing NLP pipelines. It provides much of the common infrastructure shared among many different kinds of pipelines: reading in data, passing data with modifications from stage to stage, featurizing data for each model, training and testing models, decoding the outputs of structured prediction models, evaluating outputs, writing outputs back out, cross-validation of the entire pipeline, and more.

The package is designed for experimentation, so that pipelines can be coded up quickly and stages/models can be swapped in and out easily (to the extent that later stages have not been implemented to rely on the particulars of earlier stages). In a sense, then, it is like a much lighter-weight version of UIMA for Python, tailored more for rapid exploration than careful system design.

The framework depends on several Python packages, including:

  • Gflags
  • NLTK
  • NumPy/SciPy
  • Scikit-learn
  • python-crfsuite
  • Cython (for src/nlpypline/util/streams.pyx)
  • mock

NLPypline was developed for and has so far only been used in the Causeway project. Aspects of the framework may be too closely tailored to this project, and it is missing important pieces of functionality, but I would love to make it more broadly useful. Please contact me if you are interested in using NLPypline, and I will happily help you get up and running with it. (Proper documentation will be written up if enough people express interest.)

About

A Python framework for developing and experimenting with NLP pipelines

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages