Merge pull request #159 from rvandewater/development

New features and cleanup from new version release
rvandewater · Oct 21, 2024 · 0aa632c · 0aa632c
2 parents 7d8c591 + 4dd915e
commit 0aa632c
Show file tree

Hide file tree

Showing 81 changed files with 3,276 additions and 942 deletions.
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -25,19 +25,19 @@ jobs:
       - uses: actions/checkout@v3
       - uses: conda-incubator/setup-miniconda@v2
         with:
-          activate-environment: yaib_updated
+          activate-environment: yaib
           environment-file: environment.yml
           auto-activate-base: false
       - name: Lint with flake8
         run: |
           # stop the build if there are Python syntax errors or undefined names
           flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
           # the GitHub editor is 127 chars wide
-          flake8 . --count --max-complexity=14 --max-line-length=127 --statistics
+          flake8 . --count --max-complexity=30 --max-line-length=127 --statistics
       # - name: Test with pytest
         # run: python -m pytest ./tests/recipes
       # If we want to test running the tool later on
       # - name: Setup package
       #   run: pip install -e .
       # - name: Test command line tool
-      #   run: python -m icu_benchmarks.run --help
+      #   run: python -m icu_benchmarks.run --help
diff --git a/.gitignore b/.gitignore
@@ -127,4 +127,5 @@ wandb/
 .vscode/launch.json
 yaib_logs/
 *.ckpt
-*.csv
+*.csv
+!demo_data/*/*/attrition.csv
diff --git a/README.md b/README.md
@@ -14,8 +14,8 @@
 
 [//]: # (TODO: add coverage once we have some tests )
 
-Yet another ICU benchmark (YAIB) provides a framework for doing clinical machine learning experiments on Intensive Care Unit (
-ICU) EHR data.
+Yet another ICU benchmark (YAIB) provides a framework for doing clinical machine learning experiments on Intensive Care Unit 
+(ICU) EHR data.
 
 We support the following datasets out of the box:
 
@@ -43,35 +43,33 @@ We provide five common tasks for clinical prediction by default:
 | 5   | Length of Stay (LoS)      | Hourly (within 7D)        | Regression            |
 
 New tasks can be easily added.
-For the purposes of getting started right away, we include the eICU and MIMIC-III demo datasets in our repository.
+To get started right away, we include the eICU and MIMIC-III demo datasets in our repository.
 
 The following repositories may be relevant as well:
 
 - [YAIB-cohorts](https://github.com/rvandewater/YAIB-cohorts): Cohort generation for YAIB.
 - [YAIB-models](https://github.com/rvandewater/YAIB-models): Pretrained models for YAIB.
 - [ReciPys](https://github.com/rvandewater/ReciPys): Preprocessing package for YAIB pipelines.
 
-For all YAIB related repositories, please see: https://github.com/stars/rvandewater/lists/yaib.
+For all YAIB-related repositories, please see: https://github.com/stars/rvandewater/lists/yaib.
 
 # 📄Paper
 
-To reproduce the benchmarks in our paper, we refer to: the [ML reproducibility document](PAPER.md).
+To reproduce the benchmarks in our paper, we refer to the [ML reproducibility document](PAPER.md).
 If you use this code in your research, please cite the following publication:
 
 ```
-@article{vandewaterYetAnotherICUBenchmark2023,
-	title = {Yet Another ICU Benchmark: A Flexible Multi-Center Framework for Clinical ML},
-	shorttitle = {Yet Another ICU Benchmark},
-	url = {http://arxiv.org/abs/2306.05109},
-	language = {en},
-	urldate = {2023-06-09},
-	publisher = {arXiv},
-	author = {Robin van de Water and Hendrik Schmidt and Paul Elbers and Patrick Thoral and Bert Arnrich and Patrick Rockenschaub},
-	month = jun,
-	year = {2023},
-	note = {arXiv:2306.05109 [cs]},
-	keywords = {Computer Science - Machine Learning},
+@inproceedings{vandewaterYetAnotherICUBenchmark2024,
+  title = {Yet Another ICU Benchmark: A Flexible Multi-Center Framework for Clinical ML},
+  shorttitle = {Yet Another ICU Benchmark},
+  booktitle = {The Twelfth International Conference on Learning Representations},
+  author = {van de Water, Robin and Schmidt, Hendrik Nils Aurel and Elbers, Paul and Thoral, Patrick and Arnrich, Bert and Rockenschaub, Patrick},
+  year = {2024},
+  month = oct,
+  urldate = {2024-02-19},
+  langid = {english},
 }
+
 ```
 
 This paper can also be found on arxiv [2306.05109](https://arxiv.org/abs/2306.05109)
@@ -182,17 +180,16 @@ load
 existing cache files.
 
 ```
-icu-benchmarks train \
+icu-benchmarks \
     -d demo_data/mortality24/mimic_demo \
     -n mimic_demo \
     -t BinaryClassification \
     -tn Mortality24 \
     -m LGBMClassifier \
     -hp LGBMClassifier.min_child_samples=10 \
-    --generate_cache
+    --generate_cache \
     --load_cache \
     --seed 2222 \
-    -s 2222 \
     -l ../yaib_logs/ \
     --tune
 ```
@@ -224,13 +221,14 @@ wandb agent <sweep_id>
 
 > Note: You will need to have a wandb account and be logged in to run the above commands.
 
-## Evaluate
+## Evaluate or Finetune
 
-It is possible to evaluate a model trained on another dataset. In this case, the source dataset is the demo data from MIMIC and
-the target is the eICU demo:
+It is possible to evaluate a model trained on another dataset and no additional training is done.
+In this case, the source dataset is the demo data from MIMIC and the target is the eICU demo:
 
 ```
-icu-benchmarks evaluate \
+icu-benchmarks \
+    --eval \
     -d demo_data/mortality24/eicu_demo \
     -n eicu_demo \
     -t BinaryClassification \
@@ -241,9 +239,11 @@ icu-benchmarks evaluate \
     -s 2222 \
     -l ../yaib_logs \
     -sn mimic \
-    --source-dir ../yaib_logs/mimic_demo/Mortality24/LGBMClassifier/2022-12-12T15-24-46/fold_0
+    --source-dir ../yaib_logs/mimic_demo/Mortality24/LGBMClassifier/2022-12-12T15-24-46/repetition_0/fold_0
 ```
 
+> A similar syntax is used for finetuning, where a model is loaded and then retrained. To run finetuning, replace `--eval` with `-ft`.
+
 ## Models
 
 We provide several existing machine learning models that are commonly used for multivariate time-series data.
@@ -275,6 +275,8 @@ We appreciate contributions to the project. Please read the [contribution guidel
 request.
 
 # Acknowledgements
+This project has been developed partially under the funding of “Gemeinsamer Bundesausschuss (G-BA) Innovationsausschuss” in the framework of “CASSANDRA - Clinical ASSist AND aleRt Algorithms”.
+(project number 01VSF20015). We would like to acknowledge the work of Alisher Turubayev, Anna Shopova, Fabian Lange, Mahmut Kamalak, Paul Mattes, and Victoria Ayvasky for adding Pytorch Lightning, Weights and Biases compatibility, and several optional imputation methods to a later version of the benchmark repository. 
 
 We do not own any of the datasets used in this benchmark. This project uses heavily adapted components of
 the [HiRID benchmark](https://github.com/ratschlab/HIRID-ICU-Benchmark/). We thank the authors for providing this codebase and

diff --git a/configs/experiments/LGBM_Mortality.gin b/configs/experiments/LGBM_Mortality.gin
diff --git a/configs/experiments/LSTM_Mortality.gin b/configs/experiments/LSTM_Mortality.gin
diff --git a/configs/prediction_models/BRFClassifier.gin b/configs/prediction_models/BRFClassifier.gin
@@ -0,0 +1,18 @@
+# Settings for ImbLearn Balanced Random Forest Classifier.
+
+# Common settings for ML models
+include "configs/prediction_models/common/MLCommon.gin"
+
+# Train params
+train_common.model = @BRFClassifier
+
+model/hyperparameter.class_to_tune = @BRFClassifier
+model/hyperparameter.n_estimators = [50, 100, 250, 500, 750,1000,1500]
+model/hyperparameter.max_depth = [3, 5, 10, 15]
+model/hyperparameter.min_samples_split = (2, 5, 10)
+model/hyperparameter.min_samples_leaf = (1, 2, 4)
+model/hyperparameter.max_features = ['sqrt', 'log2', 1.0]
+model/hyperparameter.bootstrap = [True, False]
+model/hyperparameter.class_weight = [None, 'balanced']
+
+
diff --git a/configs/prediction_models/CBClassifier.gin b/configs/prediction_models/CBClassifier.gin
@@ -0,0 +1,15 @@
+# Settings for Catboost classifier.
+
+# Common settings for ML models
+include "configs/prediction_models/common/MLCommon.gin"
+
+# Train params
+train_common.model = @CBClassifier
+
+model/hyperparameter.class_to_tune = @CBClassifier
+model/hyperparameter.learning_rate = (1e-4, 0.5, "log")
+model/hyperparameter.num_trees = [50, 100, 250, 500, 750,1000,1500]
+model/hyperparameter.depth = [3, 5, 10, 15]
+model/hyperparameter.scale_pos_weight = [1, 5, 10, 25, 50, 75, 99, 100, 1000]
+model/hyperparameter.border_count = [5, 10, 20, 50, 100, 200]
+model/hyperparameter.l2_leaf_reg = [1, 3, 5, 7, 9]
diff --git a/configs/prediction_models/GRU.gin b/configs/prediction_models/GRU.gin
@@ -9,11 +9,11 @@ train_common.model = @GRUNet
 # Optimizer params
 optimizer/hyperparameter.class_to_tune = @Adam
 optimizer/hyperparameter.weight_decay = 1e-6
-optimizer/hyperparameter.lr = (1e-5, 3e-4)
+optimizer/hyperparameter.lr = (1e-6, 1e-4, "log")
 
 # Encoder params
 model/hyperparameter.class_to_tune = @GRUNet
 model/hyperparameter.num_classes = %NUM_CLASSES
-model/hyperparameter.hidden_dim = (32, 256, "log-uniform", 2)
-model/hyperparameter.layer_dim = (1, 3)
+model/hyperparameter.hidden_dim = (32, 512, "log")
+model/hyperparameter.layer_dim = (1, 10)
 
diff --git a/configs/prediction_models/LGBMClassifier.gin b/configs/prediction_models/LGBMClassifier.gin
@@ -11,6 +11,6 @@ model/hyperparameter.colsample_bytree = (0.33, 1.0)
 model/hyperparameter.max_depth = (3, 7)
 model/hyperparameter.min_child_samples = 1000
 model/hyperparameter.n_estimators = 100000
-model/hyperparameter.num_leaves = (8, 128, "log-uniform", 2)
+model/hyperparameter.num_leaves = (8, 128, "log", 2)
 model/hyperparameter.subsample = (0.33, 1.0)
 model/hyperparameter.subsample_freq = 1
diff --git a/configs/prediction_models/RFClassifier.gin b/configs/prediction_models/RFClassifier.gin
@@ -8,11 +8,11 @@ train_common.model = @RFClassifier
 
 model/hyperparameter.class_to_tune = @RFClassifier
 model/hyperparameter.n_estimators = (10, 50, 100, 200, 500)
-model/hyperparameter.max_depth = (None, 5, 10, 20)
+model/hyperparameter.max_depth = (5, 10, 20)
 model/hyperparameter.min_samples_split = (2, 5, 10)
 model/hyperparameter.min_samples_leaf = (1, 2, 4)
-model/hyperparameter.max_features = ('sqrt', 'log2', None)
-model/hyperparameter.bootstrap = (True, False)
-model/hyperparameter.class_weight = (None, 'balanced')
+model/hyperparameter.max_features = ['sqrt', 'log2', None]
+model/hyperparameter.bootstrap = [True, False]
+model/hyperparameter.class_weight = [None, 'balanced']
 
 
diff --git a/configs/prediction_models/RUSBClassifier.gin b/configs/prediction_models/RUSBClassifier.gin
@@ -0,0 +1,14 @@
+# Settings for ImbLearn Balanced Random Forest Classifier.
+
+# Common settings for ML models
+include "configs/prediction_models/common/MLCommon.gin"
+
+# Train params
+train_common.model = @RUSBClassifier
+
+model/hyperparameter.class_to_tune = @RUSBClassifier
+model/hyperparameter.n_estimators = (10, 50, 100, 200, 500)
+model/hyperparameter.learning_rate = (0.005, 1, "log")
+model/hyperparameter.sampling_strategy = "auto"
+
+
diff --git a/configs/prediction_models/TCN.gin b/configs/prediction_models/TCN.gin
@@ -9,12 +9,12 @@ train_common.model = @TemporalConvNet
 # Optimizer params
 optimizer/hyperparameter.class_to_tune = @Adam
 optimizer/hyperparameter.weight_decay = 1e-6
-optimizer/hyperparameter.lr = (1e-5, 3e-4)
+optimizer/hyperparameter.lr = (1e-6, 3e-4)
 
 # Encoder params
 model/hyperparameter.class_to_tune = @TemporalConvNet
 model/hyperparameter.num_classes = %NUM_CLASSES
 model/hyperparameter.max_seq_length = %HORIZON
-model/hyperparameter.num_channels = (32, 256, "log-uniform", 2)
-model/hyperparameter.kernel_size = (2, 32, "log-uniform", 2)
+model/hyperparameter.num_channels = (32, 256, "log")
+model/hyperparameter.kernel_size = (2, 128, "log")
 model/hyperparameter.dropout = (0.0, 0.4)
diff --git a/configs/prediction_models/Transformer.gin b/configs/prediction_models/Transformer.gin
@@ -8,17 +8,17 @@ train_common.model = @Transformer
 
 optimizer/hyperparameter.class_to_tune = @Adam
 optimizer/hyperparameter.weight_decay = 1e-6
-optimizer/hyperparameter.lr = (1e-5, 3e-4)
+optimizer/hyperparameter.lr = (1e-6, 1e-4)
 
 # Encoder params
 model/hyperparameter.class_to_tune = @Transformer
-model/hyperparameter.ff_hidden_mult = 2
-model/hyperparameter.l1_reg = 0.0
+model/hyperparameter.ff_hidden_mult = (2,4,6,8)
+model/hyperparameter.l1_reg = (0.0,1.0)
 model/hyperparameter.num_classes = %NUM_CLASSES
-model/hyperparameter.hidden = (32, 256, "log-uniform", 2)
-model/hyperparameter.heads = (1, 8, "log-uniform", 2)
+model/hyperparameter.hidden = (32, 512, "log")
+model/hyperparameter.heads = (1, 8, "log")
 model/hyperparameter.depth = (1, 3)
-model/hyperparameter.dropout = (0.0, 0.4)
-model/hyperparameter.dropout_att = (0.0, 0.4)
+model/hyperparameter.dropout = 0 # no improvement (0.0, 0.4)
+model/hyperparameter.dropout_att = (0.0, 1.0)
 
 
diff --git a/configs/prediction_models/XGBClassifier.gin b/configs/prediction_models/XGBClassifier.gin
@@ -0,0 +1,17 @@
+# Settings for XGBoost classifier.
+
+# Common settings for ML models
+include "configs/prediction_models/common/MLCommon.gin"
+
+# Train params
+train_common.model = @XGBClassifier
+
+model/hyperparameter.class_to_tune = @XGBClassifier
+model/hyperparameter.learning_rate = (0.01, 0.1, "log")
+model/hyperparameter.n_estimators = [50, 100, 250, 500, 750, 1000,1500,2000]
+model/hyperparameter.max_depth = [3, 5, 10, 15]
+model/hyperparameter.scale_pos_weight = [1, 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 99, 100, 1000]
+model/hyperparameter.min_child_weight = [1, 0.5]
+model/hyperparameter.max_delta_step = [0, 1, 2, 3, 4, 5, 10]
+model/hyperparameter.colsample_bytree = [0.1, 0.25, 0.5, 0.75, 1.0]
+model/hyperparameter.eval_metric = "aucpr"
diff --git a/configs/prediction_models/common/DLCommon.gin b/configs/prediction_models/common/DLCommon.gin
@@ -3,7 +3,9 @@
 # Imports to register the models
 import gin.torch.external_configurables
 import icu_benchmarks.models.wrappers
-import icu_benchmarks.models.dl_models
+import icu_benchmarks.models.dl_models.rnn
+import icu_benchmarks.models.dl_models.transformer
+import icu_benchmarks.models.dl_models.tcn
 import icu_benchmarks.models.utils
 
 # Do not generate features from dynamic data
@@ -12,7 +14,7 @@ base_regression_preprocessor.generate_features = False
 
 # Train params
 train_common.optimizer = @Adam
-train_common.epochs = 1000
+train_common.epochs = 50
 train_common.batch_size = 64
 train_common.patience = 10
 train_common.min_delta = 1e-4

diff --git a/configs/prediction_models/common/DLTuning.gin b/configs/prediction_models/common/DLTuning.gin
@@ -2,4 +2,4 @@
 tune_hyperparameters.scopes = ["model", "optimizer"]
 tune_hyperparameters.n_initial_points = 5
 tune_hyperparameters.n_calls = 30
-tune_hyperparameters.folds_to_tune_on = 2
+tune_hyperparameters.folds_to_tune_on = 5
diff --git a/configs/prediction_models/common/MLCommon.gin b/configs/prediction_models/common/MLCommon.gin
@@ -3,7 +3,11 @@
 # Imports to register the models
 import gin.torch.external_configurables
 import icu_benchmarks.models.wrappers
-import icu_benchmarks.models.ml_models
+import icu_benchmarks.models.ml_models.sklearn
+import icu_benchmarks.models.ml_models.lgbm
+import icu_benchmarks.models.ml_models.xgboost
+import icu_benchmarks.models.ml_models.imblearn
+import icu_benchmarks.models.ml_models.catboost
 import icu_benchmarks.models.utils
 
 # Patience for early stopping

diff --git a/configs/prediction_models/common/MLTuning.gin b/configs/prediction_models/common/MLTuning.gin
@@ -1,5 +1,5 @@
 # Hyperparameter tuner settings for classical Machine Learning.
 tune_hyperparameters.scopes = ["model"]
-tune_hyperparameters.n_initial_points = 10
-tune_hyperparameters.n_calls = 50
-tune_hyperparameters.folds_to_tune_on = 3
+tune_hyperparameters.n_initial_points = 5
+tune_hyperparameters.n_calls = 30
+tune_hyperparameters.folds_to_tune_on = 5
diff --git a/configs/tasks/BinaryClassification.gin b/configs/tasks/BinaryClassification.gin
@@ -19,11 +19,10 @@ DLPredictionWrapper.loss = @cross_entropy
 
 # SELECTING PREPROCESSOR
 preprocess.preprocessor = @base_classification_preprocessor
+preprocess.modality_mapping = %modality_mapping
 preprocess.vars = %vars
 preprocess.use_static = True
 
 # SELECTING DATASET
-PredictionDataset.vars = %vars
-PredictionDataset.ram_cache = True
-
+include "configs/tasks/common/Dataloader.gin"
 
diff --git a/configs/tasks/DatasetImputation.gin b/configs/tasks/DatasetImputation.gin
@@ -22,6 +22,6 @@ preprocess.file_names = {
 preprocess.preprocessor = @base_imputation_preprocessor
 
 preprocess.vars = %vars
-ImputationDataset.vars = %vars
-ImputationDataset.ram_cache = True
+
+include "configs/tasks/common/Dataloader.gin"