Skip to content
yooper edited this page Aug 16, 2016 · 14 revisions

PHP Text Analysis

Want to process text using PHP? Well, you picked the right library for the task.

PHP Text Analysis provides a variety of tools for :

  • Analysis
    • Date Analysis - use to extract dates from a given corpus
    • Frequency Distribution - provides you with the basic tools to do simple analysis and is used as a base for many other algorithms
    • Keyword Extraction (RAKE) - use the RAKE algorithm to rapidly automate keyword extraction
  • Collections - data structures for managing documents during analysis
  • Collocation - helps you find terms that co-occur more often than would be expected by chance.
  • Console - a command line interface for performing base indexing and text mining analysis with PHP
  • Entity Extraction - helps you find entities such as people, places and dates
  • Downloaders - Downloads 3rd party data files from the web
  • Filters - A set of tools for normalizing the terms and tokens before data analysis begins
  • Phonetics - Phonetic algorithms for fixing data. Helpful when you need to perform record linkage tasks with PHP
  • Ngrams - PHP code for generating NGrams from a given set of tokens or terms
  • Stemmers - Several stemmers are available for normalizing the data sets prior to further analysis
  • Tokenizers - A common set of tokenizers is availble for breaking up the corpus into tokens or sentences
  • Utilities - helper utilities for manipulating text data

Beyond Analysis

PHP Text Analysis is a light weight Information Retrieval and NLP library built using PHP. In addition, to analysis tools, PHP Text Analysis can be used to create a search engine that supports simple and advanced query types. This is especially useful when your data models have raw text that must be searchable.

  • Adapters
  • Engines
  • Indexes
  • Query