Skip to content

Latest commit

 

History

History
14 lines (12 loc) · 1.28 KB

design.md

File metadata and controls

14 lines (12 loc) · 1.28 KB

bioCADDIE Pilot Design

The NDS bioCADDIE pilot consists of the following components:

  • Indri, Lucene, and ElasticSearch indexes and index creation scripts/configuration files.
  • Java utility classes to convert the bioCADDIE and PubMed data to TREC text format.
  • ir-tools framework for evaluation which supports evaluation over both Indri and Lucene indexes and implements the RM3 and Rocchio expansion models.
  • A set of scripts to run the baseline and expansion models, sweeping parameter combinations. The scripts can be used with GNU parallel or Kubernetes for parallelization.
  • Java utilility classes to generate RM3 and BM25+Rocchio queries from Indri and Lucene indexes.
  • A set of scripts to generate trec_eval output for model comparison.
  • Java utility class and wrapper scripts to perform leave-one-query-out cross-validation for parameter estimation, optimizing for multiple metrics.
  • R scripts to compare retrieval models using a one-tailed t-test.
  • A prototype ElasticSearch plugin that implements BM25+Rocchio expansion.
  • A prototype ElasicSearch plugin that implements the repository prior calculation.