Skip to content

DennisDeSwart/php-stanford-nlp-datastore

Repository files navigation

PHP Stanford NLP Datastore

Version Total Downloads Maintenance Minimum PHP Version License

Stores NLP data from Stanford CoreNLP server.

What does it do?

It analyses a text using Stanford CoreNLP server, then stores the result.

Which data gets stored?

  • OpenIE: these are "Subject-Relation-Object" triples. The concept is similar to "Subject-Verb-Object" triples.
http://stanfordnlp.github.io/CoreNLP/openie.html
  • Named-Entities: if a word is a "Named Entity", like a Location, Name or Time, it will store this data
http://stanfordnlp.github.io/CoreNLP/ner.html
  • Coreference: if there is a reference to a word in another sentence.
http://stanfordnlp.github.io/CoreNLP/coref.html

How does it work?

  • You submit a text.
  • The text is analyzed by the Stanford CoreNLP server
  • Results are stored in a SQLite file based database. The database file is called "datastore.db"
https://sqlite.org/
https://github.com/sqlitebrowser
  • The results are displayed on screen
  • There is also a search form to find data

This package depends on Stanford CoreNLP Server

http://stanfordnlp.github.io/CoreNLP/index.html#download

This package also depends on PHP-Stanford-CoreNLP-Adapter

https://github.com/DennisDeSwart/php-stanford-corenlp-adapter

Note: since this package contains a full version of the CoreNLP Adapter, you can use all of it's features with this package.

Installation

This package depends on these packages:

http://stanfordnlp.github.io/CoreNLP/index.html#download
https://github.com/DennisDeSwart/php-stanford-corenlp-adapter
https://github.com/doctrine/dbal
https://github.com/guzzle/guzzle

Install procedure using the ZIP files

  • Install Stanford CoreNLP Server. Check the "php-stanford-corenlp-adapter" package for an installation walkthrough
  • Download and unpack the files from this package.
  • Copy the files to your to your webserver directory. Usually "htdocs" or "var/www".
  • Run a Composer update to install the dependencies

Install as part of another project

  • Install Stanford CoreNLP Server. Check the "php-stanford-corenlp-adapter" package for an installation walkthrough
  • Add the following lines to your main project's "composer.json" require section:
    {
        "require": {
            "dennis-de-swart/php-stanford-nlp-datastore": "*"
        }
    }
  • Run a Composer update to install the dependencies
Copy these files from "/vendor/dennis-de-swart/php-stanford-nlp-datastore" to your webserver directory. Usually "htdocs" or "var/www".
- datastore.db
- bootstrap.php
  • Example code for your main project:
    // instantiate constants and the database
    require_once __DIR__.'/bootstrap.php';

    // startup Corenlp Adapter
    $coreNLP = new CorenlpAdapter();
    $coreNLP->getOutput($yourText);
    print_r($coreNLP->serverMemory); // result from CoreNLP Adapter

    // Save result to database
    $datastore = new Datastore($db->conn);
    $datastore->storeNLP($coreNLP);

Requirements

  • PHP 5.6 or higher: it also works on PHP 7
  • Java SE Runtime Enviroment, version 1.8
  • Stanford CoreNLP Server 3.7.0
  • Windows or Linux/Unix 64-bit OS, 8Gb or more memory recommended.
  • Composer for PHP
    https://getcomposer.org/

SQLite Browser

If you need a SQLite browser check here:

http://sqlitebrowser.org/

Important notes

  • Starting the CoreNLP server for the first time, takes some time because it will load a large amount of data.
  • After the first startup, the server will be much faster.
  • In my experience the Stanford CoreNLP server runs best with 8Gb of memory or more. Start the server with "-mx8g" instead of "-mx4g".
  • Also use version 3.7.0 of the server, this gives you the best and quickest results.

Example output

See

  • "datastore_result_a.PNG"
  • "datastore_result_b.PNG"
  • "datastore_result_search.PNG"

and "example.db", this is how a filled database looks like

Any questions?

Let me know. You can create an issue on GitHub. Any bugs will be fixed ASAP.