Developer Guide

The clinical trial parser library contains tools that can be used to translate clinical trial eligibility criteria. For example, it has scripts for downloading data and running the CFG and IE parsers. The library does not contain publicly available data except for 20 clinical trials, which are used to illustrate the functionality of its modules.

CFG Parser

Installation steps:

Install Go from https://golang.org/dl/
Set GOPATH so that the cloned project is in $GOPATH/src/github.com/facebookresearch/Clinical-Trial-Parser
Run ./script/cfg_parse.sh in the project root directory. The script will write the parsed relations to cfg_parsed_clinical_trials.tsv.
The program parameters can be changed either by changing the command line arguments in cfg_parse.sh or config parameters in cfg.conf.

cfg_parse.sh demonstrates how the CFG parser could be used. Applications should write their own driver module.

Quality improvements:

CFG does not parse all ordinal and numerical criteria. It may also parse some criteria incorrectly. Errors may be fixed and new capabilities added by:

Updating the grammar production rules by adding new criteria situations. It is also a good practice to add new test cases to interpreter_test.go.
Updating existing or adding new variables to variables.csv
Updating existing or adding new units to units.csv

IE Parser

Installation steps:

Install Python from https://www.python.org/
Install Natural Language Toolkit from https://www.nltk.org/
Install Go from https://golang.org/dl/
Install PyText, which can be done using Anaconda3:
- Install Anaconda3 from https://docs.anaconda.com/anaconda/install/mac-os/
- Install PyText with pip install pytext-nlp
- ONNX and Torch may need to be upgraded with conda install onnx -c conda-forge and conda install pytorch torchvision -c pytorch
- Note that PyText has an issue, which affects some users
Unzip word_embeddings.vec.gz in data/embedding
Download the MeSH vocabulary using mesh.sh
Run ./script/ie_parse.sh in the project root directory. The script will write medical terms and matched concepts to ie_parsed_clinical_trials.tsv.

The library includes a pre-trained NER binary. Drivers and config files are provided for illustrative purposes in src/cmd and src/resources/config. Applications may write their own driver modules.

Quality improvements:

The NER model can be improved by adding new training samples
The NEL module can be improved by
- A better processing of the extracted NER terms
- Incorporating a vocabulary that has a high match rate with the eligibility criteria terms
- Adding synonyms to concepts or new synonyms to the custom MeSH files
- Implementing term clustering to increase the NEL recall
Implement RE with negation extraction

Data

The library includes example scripts aact.sh and ingest.sh for downloading and ingesting clinical trials. While the scripts are provided for convenience, applications will most likely need to change them or use other means to do the same. For example, ingest.sh only samples few trials. An obvious place to start is to change the 'where' clauses. Note that these scripts use a postgreSQL database.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

developer_guide.md

developer_guide.md

Developer Guide

CFG Parser

Installation steps:

Quality improvements:

IE Parser

Installation steps:

Quality improvements:

Data

Files

developer_guide.md

Latest commit

History

developer_guide.md

File metadata and controls

Developer Guide

CFG Parser

Installation steps:

Quality improvements:

IE Parser

Installation steps:

Quality improvements:

Data