- Requirements
- Environment setup
- Linux System
- Preparing the dataset
- Train and Testing
This software requires a Linux system: Ubuntu 22.04 or Ubuntu 20.04 (other versions are not tested) and Python3.9 (other versions are not supported). This software requires 16GB memory and 20GB disk storage (we recommend 32GB memory). The software analyzes a single image in 5 seconds on Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz. Some Linux** packages** are required to run this software as listed below:
build-essential
zlib1g-dev
libncurses5-dev
libgdbm-dev
libnss3-dev
libssl-dev
libreadline-dev
libffi-dev
libsqlite3-dev
libsm6
libxrender1
wget
git
The Python packages needed are listed below. They can also be found in reqirements.txt
.
numpy>=1.22.3
seaborn>=0.11.2
sklearn>=0.0
scikit-learn>=1.1.1
torchvision>=0.13.0
pandas>=1.4.2
albumentations
torch
pymongo
tqdm
pingouin
- Open the terminal in the system, or press Ctrl+Alt+F1 to switch to the command line window.
- Clone this repo file to the home path.
git clone https://github.com/drpredict/DeepDR_Plus.git
- change the current directory to the source directory
cd DeepDR_Plus
- install dependent Python packages
python3 -m pip install --user -r requirements.txt
Supported Image File Format
JPG, PNG, and TIF formats are supported and tested. Other formats that OpenCV supports should work too. The input image must be 3-channel color fundus images with a small edge resolution larger than 448.
Training data should be put in a CSV file, containing the following columns (with patient covariables or image path):
- image: the path of fundus images (used in fundus model)
- t1: time of the last exam before the interested event
- t2: time of the first exam after the interested event
- e: censored or not, True for not censored(event observed), False for censored(no event observed)
- age: age at baseline (used in mata data model and combined model)
- gender: gender (used in metadata model and combined model)
- hba1c: hemoglobin A1c at baseline (used in metadata model and combined model)
- sbp: systolic blood pressure at baseline (used in metadata model and combined model)
- dbp: diastolic blood pressure at baseline (used in metadata model and combined model)
- bmi: body mass index at baseline (used in metadata model and combined model)
- ldl: low-density lipoprotein cholesterol at baseline (used in metadata model and combined model)
- hdl: high-density lipoprotein cholesterol at baseline (used in metadata model and combined model)
- t2d_dur: duration of T2D at baseline (used in metadata model and combined model)
- fundus_score: fundus score (only used in combined model, generated by fundus model)
The training and testing data are stored in data_fund and data_covar directories. A sample of training and testing data are also also shown in this repository.
We adopted a open-source implementation of MoCo-v2 for pre-training.
To pre-train the network, enter the MoCo-v2
directory and run the following command:
python main_moco.py
. Optionally, you may need to change configuration parameters stored in config.py
.
The trained model will be saved in MoCo-v2/models/resnet50_bs32_queue16384_wd0.0001_t0.2_cos)
directory. We choose the model with the least eval loss as the pretrained model.
For easy use of the code, we provided a simple command-line tool in train.py
.
All options are listed in the help documents. Refer to trainer.py
for more details. The following instructions can be used to train models:
- The hyper-parameters are set with environment variables.
- load_pretrain: the pre-trained model path to be loaded for fine-tuning.
- "batch_size": the training batch size
- "epochs": the number of training epochs
- "image_size": the input image resolution
- "lr": the learning rate
- "device": the device to be used for training
- "num_workers": the number of workers for data loading
- "model": the model name, ResNet-18 and ResNet-50 are supported
- Training fundus model:
- Run
python train_eval_fund.py
, with proper hyper-parameters settings. - The evaluation results are saved in
logs/
in a pickle dump. see trainer.py for more details. - To run with pretrained model, invoke
load_pretrain=MoCo-v2/models/resnet50_bs32_queue16384_wd0.0001_t0.2_cos/load_pretrain=../MoCo-v2/models/resnet50_bs32_queue16384_wd0.0001_t0.2_cos/599.pth python train_eval_fund.py
, change the model dump path as needed.
- Run
- Training metadata model or combined model:
- To train the model, first prepare the dataset, in CSV format containing normalized features as well as event information.
- The feature names to use for training are provided with command-line arguments.
- E.g. run
python train_eval_covar.py age gender hba1c sbp dbp bmi ldl hdl t2d_dur
, with proper hyper-parameters settings to train the metadata model with AGE, GENDER, HBA1C, SBP, DBP, BMI, LDL, HDL, T2D_dur as covariables. - To run the combined model, extract the scores from the fundus model and add it to the CSV file, invoke
python train_eval_covar.py age gender hba1c sbp dbp bmi ldl hdl t2d_dur fund_score
. with fundus score included in the command-line arguments. - The evaluation results are saved in
logs/
as a pickle dump. see trainer.py for more details.