Skip to content

Latest commit

 

History

History

datasets

ImgX Datasets

A TFDS-based python package for data set building.

Current supported data sets are listed below. Use the following commands to (re)build all data sets.

make build_dataset

Male pelvic MR

Description

This data set from Li et al. 2022 contains 589 T2-weighted labeled images which are split for training, validation and testing respectively.

Download and Build

Use the following commands at the root of this repository (i.e. under ImgX/) to automatically download and build the data set, which will be built under ~/tensorflow_datasets folder. Optionally, add flag --overwrite to rebuild/overwrite the data set.

tfds build imgx/datasets/male_pelvic_mr

AMOS CT

Description

This data set from Ji et al. 2022 contains 500 CT labeled images which has been split into 200, 100, and 200 images for training, validation, and test sets. But test set labels were not released, therefore validation is further split into 10 and 90 images for validation and test sets.

Download and Build

Use the following commands at the root of this repository (i.e. under ImgX/) to automatically download and build the data set, which will be built under ~/tensorflow_datasets folder. Optionally, add flag --overwrite to rebuild/overwrite the data set.

tfds build imgx/datasets/amos_ct

Muscle Ultrasound

Description

This data set from Marzola et al. 2021 contains 3910 labeled images, which has been split into 2531, 666, and 713 images for training, validation, and test sets.

Download and Build

Use the following commands at the root of this repository (i.e. under ImgX/) to automatically download and build the data set, which will be built under ~/tensorflow_datasets folder. Optionally, add flag --overwrite to rebuild/overwrite the data set.

tfds build imgx/datasets/muscle_us

Brain MR

Description

This data set from Baid et al. 2021 contains 1251 labeled images which are split for training, validation and testing respectively.

Download and Build

Manual Download

This data set requires manual data downloading from Kaggle. using kaggle API. The authentication token shall be obtained and stored under ~/.kaggle/kaggle.json.

Then, execute the following commands to download and unzip files. Afterward, return to ImgX/ folder (/app/ImgX for docker).

mkdir -p ~/tensorflow_datasets/downloads/manual/BraTS2021_Kaggle/BraTS2021_Training_Data/
cd ~/tensorflow_datasets/downloads/manual/BraTS2021_Kaggle/BraTS2021_Training_Data/
kaggle datasets download -d dschettler8845/brats-2021-task1
unzip brats-2021-task1.zip
tar xf BraTS2021_Training_Data.tar
rm BraTS2021_00495.tar
rm BraTS2021_00621.tar
rm BraTS2021_Training_Data.tar
rm brats-2021-task1.zip

This way under BraTS2021_Kaggle/ exist folders per sample. For example, files corresponding to uid BraTS2021_01666 should be located at ~/tensorflow_datasets/downloads/manual/BraTS2021_Kaggle/BraTS2021_Training_Data/BraTS2021_01666/ under which there are five files:

  • BraTS2021_01666_flair.nii.gz,
  • BraTS2021_01666_t1.nii.gz,
  • BraTS2021_01666_t1ce.nii.gz,
  • BraTS2021_01666_t2.nii.gz,
  • BraTS2021_01666_seg.nii.gz.

Automatic Build

Use the following commands at the root of this repository (i.e. under ImgX/) to automatically build the data set, which will be built under ~/tensorflow_datasets folder. Optionally, add flag --overwrite to rebuild/overwrite the data set.

tfds build imgx/datasets/brats2021_mr

Automated Cardiac Diagnosis Challenge (ACDC)

Description

This data set from Bernard et al. 2018 contains 150 samples. Samples are split into 100 and 50 for training and test sets. Each sample contains

  • a 4D image (a sequence of 3D MR images)
  • a 3D image and corresponding segmentation label for end-diastolic (ED) frame
  • a 3D image and corresponding segmentation label for end-systolic (ES) frame

Download and Build

Use the following commands at the root of this repository (i.e. under ImgX/) to automatically download and build the data set, which will be built under ~/tensorflow_datasets folder. Optionally, add flag --overwrite to rebuild/overwrite the data set.

tfds build imgx/datasets/acdc_mr