DoubleTransfer at MEDIQA 2019:
Multi-Source Transfer Learning for Natural Language Understanding in the Medical Domain
This PyTorch package implements DoubleTransfer for the MEDIQA 2019 competition, as described in:
Yichong Xu, Xiaodong Liu, Chunyuan Li, Hoifung Poon and Jianfeng Gao
DoubleTransfer at MEDIQA 2019: Multi-Source Transfer Learning for Natural Language Understanding in the Medical Domain
The BioNLP workshop, ACL 2019.
arXiv version
Please cite the above paper if you use this code.
We report results produced by this package as follows.
Task | Score(%) | Rank |
---|---|---|
Question Answering (QA) | 78.0 (Accuracy), 81.91 (Precision) | 1st |
Medical Natural Language Inference (MedNLI) | 93.8 | 3rd |
Recognizing Question Entailment (RQE) | 66.2 | 7th |
-
pull docker:
> docker pull yichongx/doubletransfer_mediqa2019
-
run docker
> docker run -it --rm --runtime nvidia yichongx/doubletransfer_mediqa2019 bash
Please refer to the following link if you first use docker: https://docs.docker.com/
- Download the data using links in the MEDIQA 2019 website.
- Prepare MNLI data as well as pretrained BERT models.
> ./download.sh
- preprocess data with BERT and SciBERT vocabularies
> ./prepro.sh
- train a model using train.py.
> python train.py --train_datasets mednli,rqe,mediqa,medquad --save_last --save_best --mediqa_score adjusted --mediqa_score_offset -2.0 --freeze_bert_first --batch_size 16 --max_seq_len 384 --data_dir ../data/mediqa_processed/mt_dnn_mediqa_384_v2/ --init_checkpoint /path/to/pretrained/model/ --float_medquad --external_datasets mnli --mtl_opt 0 --output_dir /output/path
See example codes in run.sh - To ensemble predictions:
> python ensemble_preds.py /path/to/file1/ /path/to/file2/
All the input files will be ensembled.
The code is developed based on the original MT-DNN code: https://github.com/namisan/mt-dnn
Related: MultiTask-MRC MT-DNN