Skip to content
Bill Baumgartner edited this page Apr 19, 2023 · 5 revisions

Managing annotation projects with the Elasticsearch-to-BRAT CLI

Show statistics of the annotation data for a given project

Use the following command to see statistics (mainly sentence counts) for an annotation project specified by the biolink_association parameter.

docker run --rm -v [path_to_repo]:/home/input ucdenverccp/elastic2brat:0.4 stats -b [biolink_association]

where,

  • [path_to_repo] is the local base path to this translator-relation-extraction git repository
  • [biolink_association] is the Biolink association associated with the annotation project, e.g. bl_chemical_to_gene

Create a new annotation batch

Use the following command to create a new batch of sentences to be annotated. The command will query an Elasticsearch instance to retrieve appropriate sentences for the annotation project based on the specified Biolink association.

docker run --rm -v [path_to_repo]:/home/input ucdenverccp/elastic2brat:0.4 batch -b [biolink_association] -a [annotator] -n [sentence_count] -u [elastic_url] -p [elastic_port] -k [elastic_api_key] -s [subject-idf-threshold] -o [object-idf-threshold] -g [sentences-per-page]

where,

  • [path_to_repo] is the local base path to this translator-relation-extraction git repository
  • [biolink_association] is the Biolink association associated with the annotation project, e.g. bl_chemical_to_gene
  • [annotator] is the name/key for the annotator assigned to the batch being created. Note, this will end up as a directory name so it should be a single token (no spaces)
  • [sentence_count] is the number of sentences to include in this batch
  • [elastic_url] is the URL to the Elasticsearch server -- do not include http://
  • [elastic_port] is the port for the Elasticsearch server
  • [elastic_api_key] is the API key for the Elasticsearch server
  • [subject-idf-threshold] OPTIONAL threshold for filtering the subject entities based on inverse document frequency. By default, no filtering is performed.
  • [object-idf-threshold] OPTIONAL threshold for filtering the object entities based on inverse document frequency. By default, no filtering is performed.
  • [sentences-per-page] OPTIONAL parameter that determines how many sentences are included in each BRAT file (default = 20), i.e., how many sentences are presented to the annotator on each page in the BRAT UI.