Indication as Prior Knowledge for Multimodal Disease Classification in Chest Radiographs with Transformers
Code for the paper:
Jacenków, G., O’Neil, A.Q. and Tsaftaris, S.A., 2022, March. Indication as Prior Knowledge for Multimodal Disease Classification in Chest Radiographs with Transformers. In 2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI). IEEE.
The project was developed in Python 3.7.9, PyTorch 1.8.2 with CUDA 11.1 acceleration. We use the MMF framework, which provides a boilerplate code for data loaders and common visual-linguistic models. Please, follow the MMF website on how to install the framework.
We use the MIMIC-CXR dataset for our experiments. Due to the licensing agreement, we cannot share the images and radiology reports. Please, refer to the PhysioNet website for details on how to acquire the dataset. We recommend downloading the MIMIC-CXR-JPG dataset to avoid additional pre-processing of DICOM images.
We expect the images in JPG format to be placed in dataset/mimic_cxr/subjects
folder. Please, follow the folder to see an example. All textual information,
i.e., corresponding radiology reports, ground truth labels, subject and study
IDs, are stored as jsonl
files in the dataset/mimic_cxr/annotations
folder.
We expect three files to run the experiments, i.e., test.jsonl
,
training.jsonl
, and val.jsonl
. Please, see the example.
We add several tools to ease the preprocessing and training processes, i.e.,
* image_preprocessing.py - resize all images to 224 x 224 size.
* mimic_cxr_preprocess.py - extract indication/history fields from the full radiology reports.
* precalculate_embeddings.py - precalculate BioWordVec embeddings to avoid bottlenecking during training.
Please, follow the commands to run the experiments (training and evaluation).
pip install -r requirements.txt # run once, after installing the MMF framework.
pip install -e . # run once, install the project as a Python package.
./run.sh # run training and evaluation protocol.
The evaluation protocol will store predictions as json
files. Please, use
the Jupyter notebooks to evaluate the predictions with the aforementioned files.