C&C tools is a suite of software for linguistic analysis of the English language, including a tokenizer, several taggers and a parser. Boxer is a tools for deep semantic analysis that takes in input the output of the C&C parser. Together, the C&C tools and Boxer form a pipeline toolchain to perform a complete analysis on English text. Here is an example:
$ curl -d 'John loves Mary.' ''
The main entry point to the C&C/Boxer API is
$CANDCAPI is the URL of the API installation. $FORMAT is either raw or json, so possible entry point include:
The text to analyze must be passed as POST to the HTTP request. The command line options for Boxer are passed as URL paramerers. Here are listed:
- Option (values (default)) description
- copula (true, false) the copula will introduce an equality condition
- instantiate (true, false) generate Prolog atoms for all discourse referents
- integrate (true, false) produces one DRS for all input sentences
- modal (true, false) modal DRS-conditions are used
- nn (true, false) resolves noun-noun relations
- resolve (true, false) resolve all anaphoric DRSs and perform merge-reduction
- roles (proto, verbnet) role inventory (proto-roles or VerbNet roles
- tense (true, false) tense is represented following Kamp & Reyle
- theory (drt, sdrt) Standard DRSs with drt, Segmented DRSs with sdrt
- semantics (drs,pdrs,fol,drg,tacitus,der] The basic (and default) formalism of semantics is drs, but other formats are also possible: pdrs (DRSs with labels, following Projective DRT); fol (first-order formula syntax); drg (discourse representation graphs); tacitus (Hobbsian semantics); ccg (input CCG derivation, nicely printed).
Here's an example using the option semantics to get a first-order logic formula:
$ curl -d 'Every man loves a woman' ''
For a more extensive description of the options of Boxer see the
official documentation.
NOTE the link http://svn.ask.it.usyd.edu.au/trac/candc/wiki/BoxerOptions is dead.
The API can return either raw text or JSON. The raw text version corresponds to the standard output of the C&C pipeline. The JSON version is a simple JSON structure containing both the standard output and the standard error:
{"err": "standard error", "out": "standard output"}
It is possible to access the single tools separately by using the folliowing URLs:
The tokenizer t takes in input a normal text. The parser candc takes in input a tokenized text, i.e. a list of words separated by whitespace. boxer takes in input the Prolog output of the C&C parser. For convenience, also the combination of intermediate steps of the pipeline are included in the API:
respectively, the call the combination tokenizer/parser and parser/Boxer.
To see the version af C&C/Boxer used by the API:
Discourse Representation Graph is a semantic formalism described in the paper V. Basile, J. Bos: Towards Generating Text from Discourse Representation Structures. The C&C/Boxer API provides an entry point to generate a PNG image of the DRG of a given text:
The URL accepts the same GET parameter as pipeline and returns a raw PNG file.