Skip to content

DaisukeBekki/lightblue

Repository files navigation

README

What is this repository for?

  • lightblue is a multi-lingual CCG parser with DTS representations
  • Copyright owner: Daisuke Bekki and Bekki Laboratory

Installing lightblue

Prerequisite 1: Haskell Stack

In Linux:

$ wget -qO- https://get.haskellstack.org/ | sh

In Mac:

$ brew install haskell-stack

See https://docs/haskellstack.org/en/stable/README/#how-to-install for details.

Prerequisite 2: Installation of Japanese morphological analyzer

One of the following Japanese morphological analyzers must be installed before executing lightblue.

Prerequisite 3: Installation of English morphological analyzer

The following English morphological analyzers must be installed before executing lightblue.

Download lightblue

Do the following in the directory under which you'd like to install lightblue.

$ git clone --depth=1 https://github.com/DaisukeBekki/lightblue.git

This operation will create, under the current directory, a new directory lightblue. Henceforth we will refer to the full path to this directory as <lightblue>.

Configuration and Installation

You need to add the environment variable LIGHTBLUE and set its value as <lightblue>. You may add the line export LIGHTBLUE=<lightblue> to .bashrc, .bash.profile, .bash_profile, or whatever configuration file for your shell. Then move to <lightblue> and do the following:

$ cd <lightblue>
$ stack build

Set the permission of the shell scripts lightblue to executable.

$ chmod 755 lightblue

Running lightblue

Quick Start

To parse a Japanese sentence and get a parsing result in a text format, execute:

$ echo 太郎がパンを食べた。 | ./lightblue jp parse -s text

To parse an English sentence and get a parsing result, execute:

$ echo John loves Mary. | ./lightblue en parse -s text

To see a parsing result in HTML formal, execute (choose your browser):

$ echo 太郎がパンを食べた。 | ./lightblue jp parse -s html > result.html; firefox result.html

If you have a text file (one sentence per line) <corpusfile>, then you can feed its path to lightblue by:

$ ./lightblue jp parse -s html -f <corpusfile>

To parse a JSeM file and execute inferences therein, then you can feed it to lightblue by:

$ ./lightblue jp jsem -f <jsemfile>

Usage

The syntax of the lightblue command is as follows:

./lightblue <lang> <lang's local options> <command> <command's local options> <global options>
Lang
jp Japanese
en English (no local options)
Local Options for jp Default Description
-m or --ma {juman|jumanpp|kwja} kwja Specify morphological analyzer
--filter {knp|kwja|none} none Specify node filter
Command
parse Parse Sentences and returns parsing results.
jsem Parse a JSeM file and execute inferences.
numeration Shows the list of lexical items prepared for parsing the given sentence
version Print the lightblue version.
stat Print the lightblue statistics.

Each of parse and jsem commands has a set of local options.

Local Options for parse Default Description
-o or --output {tree|postag} tree Specify the output content.
tree: Outputs parse trees and their type check results.
postag: Outputs only lexical items (Use lightblue a part-of-speech tagger)
Local Options for jsem Default Description
--jsemid <text> all Skip JSeM data the JSeM ID of which is not equial to this value.
--nsample <int> -1 Specify a number of JSeM data to process (A negative value means all data)

The global options are common to all commands.

Global Options Default Description
-s or --style {text|tex|xml|html} text Show results in the specified format.
-p or --prover {Wani|Null} Wani Choose a prover.
Wani: Use Wani prover (Daido and Bekki 2020)
None: Use the null prover (that always returns no diagrams).
-f or --file <filepath> Read input texts from
(Specify '-' to use stdin)
-b or --beam <int> 32 Set the beam width to
--nparse <int> 1 Search only N-best parse trees for each sentence (A negative value means all trees)
--ntypecheck <int> 1 Search only N-best diagrams for each type checking of a logical form (A negative value means all diagrams)
--nproof <int> -1 Search only N-best diagrams for each proof search (A negative value means all diagrams)
--maxdepth <int> 5 Set the maximum search depth in proof search (default: 9)
--maxtime <int> 10000 Set the maximum search time in proof search (default: 10000)
--noTypeCheck If specified, show no type checking diagram for each sentence.
--noInference If specified, execute no inference for each discourse.
--time Show the execution time in stderr.
--verbose Show type infer/check logs in stderr.

For developpers

Installing Haskell-mode for Emacs will help.

$ sudo apt install haskell-mode

The following command creates an HTML document at: <lightblue>/haddock/doc/html/lightblue/index.html

$ stack build --haddock

Contact

About

A CCG parser for Japanese with DTS-representations

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages