Skip to content
regunathb edited this page Apr 11, 2013 · 2 revisions

Sift is a set of libraries for interpreting useful information from unstructured data. Sift employs techniques commonly found in Natural Language Processing like Stemming, Sentiment analysis, Word segmentation etc.

The Sift libraries are organized by projects and each provides a set of capabilities, for e.g:

  • tagcloud - contains a library for generating tag clouds that may be written to image files or as JSON files
  • runtime - provides a processing API that is inspired by Map Reduce but follows data structures similar to Twitter Storm.
  • batch - provides a Trooper batch based execution container for running the 'runtime' and 'tagcloud' libraries.

Table of Contents

Clone this wiki locally