Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add datasets D4, D5 and models M4, M5 #23

Merged
merged 11 commits into from
Dec 13, 2023
Merged

Add datasets D4, D5 and models M4, M5 #23

merged 11 commits into from
Dec 13, 2023

Conversation

valosekj
Copy link
Member

@valosekj valosekj commented Nov 22, 2023

This PR adds dataset D4 and model M4.

Dataset D4

  • Five images used for inter-rater variability (sub-007_ses-headNormal_T2w.nii.gz, sub-010_ses-headUp_T2w.nii.gz, sub-amu02_T2w.nii.gz, sub-barcelona01_T2w.nii.gz, sub-brnoUhb03_T2w.nii.gz) were moved from the training dataset to the test dataset. The reason is that I want to apply model M4 on these five images and compare the M4 predictions with manual segmentations from 4 raters -- the images cannot be in the training set (the model would be biased).
  • The D3 model was applied to five new randomly chosen images from the spine-generic dataset (sub-mgh01_T2w.nii.gz, sub-mgh02_T2w.nii.gz, sub-stanford02_T2w.nii.gz, sub-stanford05_T2w.nii.gz, sub-ucdavis03_T2w.nii.gz), the images were QCed, manually corrected and added to the training dataset.
  • The D4 training dataset comprises 33 images, and the test dataset comprises 5 images. For details, see D4.tsv.

Model M4

  • The dataset D4 is composed of 38 images with 33 for train.
  • I am currently training 5 folds of a nnUNet 3d_fullres model for 2000 epochs:
nnUNetv2_plan_and_preprocess -d 011 --verify_dataset_integrity -c 3d_fullres
CUDA_VISIBLE_DEVICES=1 nnUNetv2_train 011 3d_fullres 0 -tr nnUNetTrainer_2000epochs
CUDA_VISIBLE_DEVICES=2 nnUNetv2_train 011 3d_fullres 1 -tr nnUNetTrainer_2000epochs
CUDA_VISIBLE_DEVICES=3 nnUNetv2_train 011 3d_fullres 2 -tr nnUNetTrainer_2000epochs
CUDA_VISIBLE_DEVICES=1 nnUNetv2_train 011 3d_fullres 3 -tr nnUNetTrainer_2000epochs
CUDA_VISIBLE_DEVICES=2 nnUNetv2_train 011 3d_fullres 4 -tr nnUNetTrainer_2000epochs

@valosekj valosekj changed the title Add dataset D4 and model M4 Add datasets D4, D5 and models M4, M5 Dec 1, 2023
@valosekj
Copy link
Member Author

valosekj commented Dec 1, 2023

Dataset D5

During the M4 model training using the D4 dataset, I noticed 0 and nan dice for some levels:

2023-11-30 13:49:15.888252: Pseudo dice [0.0, 0.6834, 0.6912, 0.5411, 0.5132, 0.3478, 0.2165, 0.0, 0.0, nan, nan]

I checked all the D4 labels (using check_voxels.py script) and found that some labels were wrong (some labels contained values 1 (probably a legacy from binary levels), and some levels were mislabeled) --> I corrected the labels. Also, I made sure that the labels contain only levels 2 to 8 (we do not have enough subjects with levels >9).
Also, I removed sub-004_ses-headUp (difficult to label) and sub-008_ses-headUp (wrong FOV covering only C1-C4). The final dataset is called D5.
The D5 training dataset comprises 31 images, and the test dataset comprises 5 images. For details, see D5.tsv.

Model M5

  • The dataset D5 (~/duke/projects/ml_spinal_rootlets/datasets/Dataset012_M5) is composed of 36 images with 31 for train.
  • I am currently training 5 folds of a nnUNet 3d_fullres model for 2000 epochs:

(3 models on gpu2, 2 models on gpu3)

nnUNetv2_plan_and_preprocess -d 012 --verify_dataset_integrity -c 3d_fullres
CUDA_VISIBLE_DEVICES=2 nnUNetv2_train 012 3d_fullres 0 -tr nnUNetTrainer_2000epochs
CUDA_VISIBLE_DEVICES=2 nnUNetv2_train 012 3d_fullres 1 -tr nnUNetTrainer_2000epochs
CUDA_VISIBLE_DEVICES=2 nnUNetv2_train 012 3d_fullres 2 -tr nnUNetTrainer_2000epochs
CUDA_VISIBLE_DEVICES=3 nnUNetv2_train 012 3d_fullres 3 -tr nnUNetTrainer_2000epochs
CUDA_VISIBLE_DEVICES=3 nnUNetv2_train 012 3d_fullres 4 -tr nnUNetTrainer_2000epochs

@valosekj
Copy link
Member Author

valosekj commented Dec 6, 2023

Model M5 training has finished and saved on duke (~/duke/projects/ml_spinal_rootlets/models/Dataset012_M5_2023-12-06).

fold_0

image

fold_1

image

fold_2

image

fold_3

image

fold_4

image


fold_2 and fold_3 look like the best! This is also confirmed by level-wise pseudo dice:

Kapture 2023-12-06 at 11 24 43

(figures generated using plot_nnunet_training_log.py)

…tract epoch number and pseudo dice and plot them

This is useful for comparing multi-class training (because nnUNet plots only the mean dice across classes).
@valosekj valosekj marked this pull request as ready for review December 6, 2023 16:38
@valosekj valosekj merged commit 94b1458 into main Dec 13, 2023
@valosekj valosekj deleted the jv/add_D4_and_M4 branch December 13, 2023 12:56
@valosekj
Copy link
Member Author

valosekj commented Jan 11, 2024

@naga-karthik pointed me to this comment about fold_all.

TLDR: fold_all means to train the model on all training data (i.e., there is no validation data). In contrast, training on individual folds (as done in this comment) means that cross-validation is done (i.e., training data is split into train/val splits).
The advantage of fold_all is that it uses more data (both train/val splits) for training and might thus provide better performance!

So, I retrained the model will fold_all:

CUDA_VISIBLE_DEVICES=2 nnUNetv2_train 012 3d_fullres all -tr nnUNetTrainer_2000epochs

And indeed, the fold_all performs the best on the testing data (i.e., data not included during train/val):

image

The table shows the mean +- STD Dice Score (across five testing subjects) for individual rootlets (C2-C8).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant