Skip to content

MLD3/Combining-chest-X-rays-and-EHR-data-ARF

Repository files navigation

Overview

This is the code repository for the manuscript "Combining chest X-rays and EHR data using machine learning to diagnose acute respiratory failure".

Directory structures

  • dataset/ contains data loaders

  • model/ contains model loaders

  • checkpoint/ is where model checkpoints are saved

Download and Process the Data

Follow directions to download the MIMIC-IV, MIMIC-CXR, and CheXpert datasets

Note: code to extract an AHRF cohort from MIMIC will be available soon.

Running the code

Config file

Pre-specified arguments can be set in config.json:

Required arguments:

  • csv_file: Path to metadata file.
  • checkpoint: Path to file location where model checkpoints will be saved.
  • labels: column name in the metadata files of the classes. These can be separated by "|" (e.g., CHF|Pneumonia|COPD)
  • rotate degrees: degrees of rotation to use for random rotation in image augmentation.
  • disk: if disk = 1, all images will be loaded into memory before training. Otherwise during training images will be fetched from disk.
  • mask : mask = 1 if masked loss will be used (i.e., if there are missing labels). All missing labels in the metadata file should be set to -1.
  • early_stop: early_stop = 1 if early stopping criteria will be used. Otherwise model will train to 3 epochs.
  • pretrain: Whether or not to use an initialization. If pretrain is "yes", then ImageNet initialization will be used unless a pretrain file is specified. Otherwise, pretrain should be "random"
  • pretrain_file: file path to pretrained model (i.e., pretrained model on MIMIC-CXR and CheXpert)
  • pretrain_classes: number of labels pretrain model had
  • freeze_all: 1 or 0: whether or not to freeze all the layers but the classifier in the DenseNet
  • loader_names : list of split names (i.e., ["train", "valid", "test"]). You do not have to include "test".

Training a model

The following exmple code will train a model using train.py. Each run requires that a model_name and model_type be specificied. There are pre-specified in the config file along with other parameters (described in further detail below). Models will be saved in the directory chexpoint/model_type/model_name.

Other non-required arguments are:

Arguments

  • gpu: specify the gpu numbers to train on, default is 0 and 1.
  • budget: number of hyperparameter combinations to try. Default is 50.
  • repeats: number of seed initializations to try. Default is 3.
  • save_every: for pretraining on MIMIC-CXR and CheXpert. Number of iterations to complete before saving a checkpoint. Default is None and will save after every epoch.
  • save_best_num, for pretraining on MIMIC-CXR and CheXpert. Number of top checkpoints to save (based on best AUROC performance on the validation set). Default is 1.
  • optimizer: optimzier to use. Default is "sgd", but can also choose "adam" for pretraining on MIMIC-CXR and CheXpert.
python train.py --model_type example_model_type --model_name example_model_name 

Pretraining

To train a model on MIMIC-CXR and CheXpert, you'll want to use the save_every, save_best_num, and optimizer arguments. This will train on an ImageNet initialized model:

python train.py --model_type example_model_type --model_name example_model_name --save_every 4800 --save_best_num 10 --optimizer adam  

Fine-tuning a model on the AHRF cohort:

To train a DenseNet model after pretraining on either MIMIC-CXR/CheXpert, you'll need to specify the file location of the pretrained model in the config file, as well as the number of classes in the pretrained model.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published