C&C tools is a suite of software for linguistic analysis of the English language, including a tokenizer, several taggers and a parser. Boxer is a tools for deep semantic analysis that takes in input the output of the C&C parser. Together, the C&C tools and Boxer form a pipeline toolchain to perform a complete analysis on English text. Here is an example:
$ curl -d 'John loves Mary.' 'http://127.0.0.1:7778/raw/pipeline'
sem(1,[1001:[tok:'John',pos:'NNP',lemma:'John',namex:'I-PER'],1002:[tok:loves,pos:'VBZ',lemma:love,namex:'O'],1003:[tok:'Mary',pos:'NNP',lemma:'Mary',namex:'I-PER'],1004:[tok:'.',pos:'.',lemma:'.',namex:'O']],merge(drs([[]:B,[]:C],[[1003]:named(B,mary,per,0),[1001]:named(C,john,per,0)]),drs([[]:D],[[]:rel(D,B,patient,0),[]:rel(D,C,agent,0),[1002]:pred(D,love,v,0)]))).
The main entry point to the C&C/Boxer API is
$CANDCAPI/$FORMAT/pipeline
$CANDCAPI is the URL of the API installation. $FORMAT is either raw or json, so possible entry point include:
http://my.installation.of.candcapi.net/raw/pipeline
http://my.installation.of.candcapi.net/json/pipeline
The text to analyze must be passed as POST to the HTTP request. The command line options for Boxer are passed as URL paramerers. Here are listed:
- Option (values (default)) description
- copula (true, false) the copula will introduce an equality condition
- instantiate (true, false) generate Prolog atoms for all discourse referents
- integrate (true, false) produces one DRS for all input sentences
- modal (true, false) modal DRS-conditions are used
- nn (true, false) resolves noun-noun relations
- resolve (true, false) resolve all anaphoric DRSs and perform merge-reduction
- roles (proto, verbnet) role inventory (proto-roles or VerbNet roles
- tense (true, false) tense is represented following Kamp & Reyle
- theory (drt, sdrt) Standard DRSs with drt, Segmented DRSs with sdrt
- semantics (drs,pdrs,fol,drg,tacitus,der] The basic (and default) formalism of semantics is drs, but other formats are also possible: pdrs (DRSs with labels, following Projective DRT); fol (first-order formula syntax); drg (discourse representation graphs); tacitus (Hobbsian semantics); ccg (input CCG derivation, nicely printed).
Here's an example using the option semantics to get a first-order logic formula:
$ curl -d 'Every man loves a woman' 'http://127.0.0.1:7778/raw/pipeline?semantics=fol'
fol(1,not(some(A,and(n1man(A),not(some(B,some(C,and(r1patient(B,C),and(r1agent(B,A),and(v1love(B),n1woman(C))))))))))).
For a more extensive description of the options of Boxer see the
official documentation.
NOTE the link http://svn.ask.it.usyd.edu.au/trac/candc/wiki/BoxerOptions is dead.
The API can return either raw text or JSON. The raw text version corresponds to the standard output of the C&C pipeline. The JSON version is a simple JSON structure containing both the standard output and the standard error:
{"err": "standard error", "out": "standard output"}
It is possible to access the single tools separately by using the folliowing URLs:
$CANDCAPI/$FORMAT/t
$CANDCAPI/$FORMAT/candc
$CANDCAPI/$FORMAT/boxer
The tokenizer t takes in input a normal text. The parser candc takes in input a tokenized text, i.e. a list of words separated by whitespace. boxer takes in input the Prolog output of the C&C parser. For convenience, also the combination of intermediate steps of the pipeline are included in the API:
$CANDCAPI/$FORMAT/tcandc
$CANDCAPI/$FORMAT/candcboxer
respectively, the call the combination tokenizer/parser and parser/Boxer.
To see the version af C&C/Boxer used by the API:
$CANDCAPI/$FORMAT/version
Discourse Representation Graph is a semantic formalism described in the paper V. Basile, J. Bos: Towards Generating Text from Discourse Representation Structures. The C&C/Boxer API provides an entry point to generate a PNG image of the DRG of a given text:
$CANDCAPI/drg
The URL accepts the same GET parameter as pipeline and returns a raw PNG file.