This is the code repository for the manuscript "Combining chest X-rays and EHR data using machine learning to diagnose acute respiratory failure".
-
dataset/ contains data loaders
-
model/ contains model loaders
-
checkpoint/ is where model checkpoints are saved
Follow directions to download the MIMIC-IV, MIMIC-CXR, and CheXpert datasets
Note: code to extract an AHRF cohort from MIMIC will be available soon.
Config file
Pre-specified arguments can be set in config.json:
Required arguments:
- csv_file: Path to metadata file.
- checkpoint: Path to file location where model checkpoints will be saved.
- labels: column name in the metadata files of the classes. These can be separated by "|" (e.g., CHF|Pneumonia|COPD)
- rotate degrees: degrees of rotation to use for random rotation in image augmentation.
- disk: if disk = 1, all images will be loaded into memory before training. Otherwise during training images will be fetched from disk.
- mask : mask = 1 if masked loss will be used (i.e., if there are missing labels). All missing labels in the metadata file should be set to -1.
- early_stop: early_stop = 1 if early stopping criteria will be used. Otherwise model will train to 3 epochs.
- pretrain: Whether or not to use an initialization. If pretrain is "yes", then ImageNet initialization will be used unless a pretrain file is specified. Otherwise, pretrain should be "random"
- pretrain_file: file path to pretrained model (i.e., pretrained model on MIMIC-CXR and CheXpert)
- pretrain_classes: number of labels pretrain model had
- freeze_all: 1 or 0: whether or not to freeze all the layers but the classifier in the DenseNet
- loader_names : list of split names (i.e., ["train", "valid", "test"]). You do not have to include "test".
Training a model
The following exmple code will train a model using train.py. Each run requires that a model_name and model_type be specificied. There are pre-specified in the config file along with other parameters (described in further detail below). Models will be saved in the directory chexpoint/model_type/model_name.
Other non-required arguments are:
- gpu: specify the gpu numbers to train on, default is 0 and 1.
- budget: number of hyperparameter combinations to try. Default is 50.
- repeats: number of seed initializations to try. Default is 3.
- save_every: for pretraining on MIMIC-CXR and CheXpert. Number of iterations to complete before saving a checkpoint. Default is None and will save after every epoch.
- save_best_num, for pretraining on MIMIC-CXR and CheXpert. Number of top checkpoints to save (based on best AUROC performance on the validation set). Default is 1.
- optimizer: optimzier to use. Default is "sgd", but can also choose "adam" for pretraining on MIMIC-CXR and CheXpert.
python train.py --model_type example_model_type --model_name example_model_name
Pretraining
To train a model on MIMIC-CXR and CheXpert, you'll want to use the save_every, save_best_num, and optimizer arguments. This will train on an ImageNet initialized model:
python train.py --model_type example_model_type --model_name example_model_name --save_every 4800 --save_best_num 10 --optimizer adam
Fine-tuning a model on the AHRF cohort:
To train a DenseNet model after pretraining on either MIMIC-CXR/CheXpert, you'll need to specify the file location of the pretrained model in the config file, as well as the number of classes in the pretrained model.