Merge pull request #247 from khanlab/download-models-v2

Download models inside workflow, update docs accordingly
khanlab · Aug 3, 2023 · 994103d · 994103d
2 parents dbf047f + f805568
commit 994103d
Show file tree

Hide file tree

Showing 10 changed files with 111 additions and 43 deletions.
diff --git a/.github/workflows/push_container.yml b/.github/workflows/push_container.yml
@@ -43,7 +43,6 @@ jobs:
             ghcr.io/${{ github.repository }}
           flavor: |
             latest=auto
-            suffix=_synthseg
       
       - name: Build and push Docker images
         uses: docker/build-push-action@ad44023a93711e3deb337508980b4b5e9bcdc5dc

diff --git a/Dockerfile b/Dockerfile
@@ -4,11 +4,11 @@ MAINTAINER alik@robarts.ca
 
 COPY . /src/
 
-#pre-download the models here:
-ENV HIPPUNFOLD_CACHE_DIR=/opt/hippunfold_cache
+# avoid pre-downloading the models to make for lighter container
+# ENV HIPPUNFOLD_CACHE_DIR=/opt/hippunfold_cache
 
 #install hippunfold and imagemagick (for reports)
-RUN pip install /src && hippunfold_download_models && \
+RUN pip install --no-cache-dir /src && \
     apt install -y graphviz && \
     wget https://imagemagick.org/archive/binaries/magick && \
     mv magick /usr/bin && chmod a+x /usr/bin/magick 

diff --git a/README.md b/README.md
@@ -20,12 +20,16 @@ This is especially useful for:
 
 ## NEW: Version 1.3.0 release 
 
-Major changes include the addition of unfolded space registration to a reference atlas harmonized across seven ground-truth histology samples. This method allows shifting in unfolded space, providing even better intersubject alignment.
+Major changes include the addition of unfolded space registration to a reference atlas harmonized across seven ground-truth histology samples. This method allows shifting in unfolded space, providing even better intersubject alignment. 
+
+*Note: this replaces the default workflow, however you can revert to the legacy workflow, disabling unfolded space registration, by setting `--atlas bigbrain` or `--no-unfolded-reg`*
 
 Read more in our [preprinted manuscript](https://www.biorxiv.org/content/10.1101/2023.03.30.534978v1)
 
 Also the ability to specify a new **experimental** UNet model that is contrast-agnostic using [synthseg](https://github.com/BBillot/SynthSeg) and trained using more detailed segmentations. This generally produces more detailed results but has not been extensively tested yet. 
 
+
+
 ## Workflow
 
 The overall workflow can be summarized in the following steps:

diff --git a/docs/contributing/contributing.md b/docs/contributing/contributing.md
@@ -104,6 +104,7 @@ trained models. For Khan lab's members, the following line must be add to the ba
 
         export HIPPUNFOLD_CACHE_DIR="/project/6050199/akhanf/opt/hippunfold_trained_models"
 
+
 Note: make sure to reload your bash profile if needed (`source ~./bash_profile`).        
 
 5. For an easier execution in Graham, it's recommended to also install
@@ -185,8 +186,10 @@ If poetry is not installed, please refer to the [installation documentation](htt
 
 The trained model files we use for hippunfold are large and thus are not
 included directly in this github repository, and instead are downloaded
-from Zenodo releases. If you are using the docker/singularity 
-container, `docker://khanlab/hippunfold`, they are pre-downloaded there, in `/opt/hippunfold_cache`.
+from Zenodo releases. 
+
+### For HippUnfold versions earlier than 1.3.0 (< 1.3.0): 
+If you are using the docker/singularity container, `docker://khanlab/hippunfold`, they are pre-downloaded there, in `/opt/hippunfold_cache`.
 
 If you are not using this container, you will need to download the models before running hippunfold, by running:
 
@@ -196,6 +199,18 @@ This console script (installed when you install hippunfold) downloads all the mo
 which on Linux is typically `~/.cache/hippunfold`. To override this, you can set the `HIPPUNFOLD_CACHE_DIR` environment
 variable before running `hippunfold_download_models` and `hippunfold`.
 
+### NEW: For HippUnfold versions 1.3.0 and later (>= 1.3.0):
+With the addition of new models, including all models in the container was not feasible and a change was made to 
+**not include** any models in the docker/singularity containers. In these versions, the `hippunfold_download_models` command
+is removed, and any models will simply be downloaded as part of the workflow. As before, all models will be stored in the system cache dir, 
+which is typically `~/.cache/hippunfold`, and to override this can set the `HIPPUNFOLD_CACHE_DIR` environment variable before running `hippunfold`.
+
+If you want to pre-download a model (e.g. if your compute nodes do not have internet access), you can run simply run `download_model` rule in HippUnfold e.g.:
+
+```
+hippunfold BIDS_DIR OUTPUT_DIR PARTICIPANT_LEVEL --modality T1w --until download_model -c 1
+```
+
 
 ## Overriding Singularity cache directories
 

diff --git a/docs/getting_started/docker.md b/docs/getting_started/docker.md
@@ -27,7 +27,7 @@ HippUnfold, and can be listed with `--help-snakemake`:
 
 ## Running an example
 
-Download and extract a single-subject BIDS dataset for this test from https://www.dropbox.com/s/mdbmpmmq6fi8sk0/hippunfold_test_data.tar. Here we will also assume you chose to save and extract to the directory `c:\Users\jordan\Downloads\`.
+Download and extract a single-subject BIDS dataset for this test from [hippunfold_test_data.tar](https://www.dropbox.com/s/mdbmpmmq6fi8sk0/hippunfold_test_data.tar). Here we will also assume you chose to save and extract to the directory `c:\Users\jordan\Downloads\`.
 
 This contains a `ds002168/` directory with a single subject, that has a both T1w and T2w images. 
 

diff --git a/docs/getting_started/installation.md b/docs/getting_started/installation.md
@@ -47,7 +47,7 @@ The HippUnfold BIDS App is available on a DockerHub as versioned releases and de
 
 #### Pros:
 - Compatible with non-Linux systems 
-- All dependencies+models in a single container
+- All dependencies+models (* See Note 1) in a single container
 
 #### Cons:
 - Typically not possible on shared machines
@@ -59,7 +59,7 @@ The HippUnfold BIDS App is available on a DockerHub as versioned releases and de
 The same docker container can also be used with Singularity (now Apptainer). Instructions can be found below.
 
 #### Pros:
-- All dependencies+models in a single container
+- All dependencies+models (* See Note 1) in a single container 
 - Container stored as a single file (.sif)
 
 #### Cons:
@@ -80,5 +80,6 @@ Instructions for this can be found in the **Contributing** documentation page.
 - Must use Python virtual environment
 - Only compatible on Linux systems with Singularity for external dependencies
 
-
+## Note 1: 
+As of version 1.3.0 of HippUnfold, containers are no longer shipped with all the models, and the models are downloaded as part of the workflow. By default, models are placed in `~/.cache/hippunfold` unless you set the `HIPPUNFOLD_CACHE_DIR` environment variable. See [Deep learning nnU-net model files](https://hippunfold.readthedocs.io/en/latest/contributing/contributing.html#deep-learning-nnu-net-model-files) for more information.
 
diff --git a/docs/usage/faq.md b/docs/usage/faq.md
@@ -2,6 +2,8 @@
 
 1. [](run-inference-mem)
 2. [](no-input-images)
+3. [](container-size)
+4. [](model-files)
 
 
 (run-inference-mem)=
@@ -41,6 +43,17 @@ This can happen if:
  - Singularity or docker cannot access your input directory. For Singularity, ensure your [Singularity options](https://docs.sylabs.io/guides/3.1/user-guide/cli/singularity_run.html) are appropriate, in particular `SINGULARITY_BINDPATH`. For docker, ensure you are mounting the correct directory with the `-v` flag described in the [Getting started](https://hippunfold.readthedocs.io/en/latest/getting_started/docker.html) section. 
  - HippUnfold does not recognize your BIDS-formatted input images. This can occur if, for example, T1w images are labelled with the suffix `_t1w.nii.gz` instead of `_T1w.nii.gz` as per [BIDS specifications](https://bids.neuroimaging.io/specification.html). HippUnfold makes use of [PyBIDS](https://github.com/bids-standard/pybids) to parse the dataset, so we suggest you use the [BIDS Validator](https://bids-standard.github.io/bids-validator/) to ensure your dataset has no errors. Note: You can override BIDS parsing and use custom filenames with the `--path-*` option as described in the [](../usage/useful_options.md#parsing-non-bids-datasets-with-custom-paths) section. 
 
-
+(container-size)=
+## Why is the HippUnfold Docker/Singularity/Apptainer container so large?
+
+In addition to some large software dependencies, the container has historically included U-net models for all the possible modalities we trained, each model taking up 2-4GB. 
+We have addressed this issue in versions >= 1.3.0, by updating the workflow to download models on the fly (when they have not been previously downloaded), and not including any 
+models in the container itself. This drops the container size significantly (<4GB compressed).
+
+(model-files)=
+## Why do I end up with large files in `~/.cache/hippunfold` after running HippUnfold? 
+
+This folder is where the nnU-net model parameters are stored by default. You can override the location with the `HIPPUNFOLD_CACHE_DIR` environment variable. See [](../contributing/contributing.md#deep-learning-nnu-net-model-files) for more details.
+
 
 
diff --git a/hippunfold/config/snakebids.yml b/hippunfold/config/snakebids.yml
@@ -415,14 +415,14 @@ modality: T2w
 
 #these will be downloaded to ~/.cache/hippunfold
 nnunet_model:
-  T1w: trained_model.3d_fullres.Task101_hcp1200_T1w.nnUNetTrainerV2.model_best.tar
-  T2w: trained_model.3d_fullres.Task102_hcp1200_T2w.nnUNetTrainerV2.model_best.tar
-  hippb500: trained_model.3d_fullres.Task110_hcp1200_b1000crop.nnUNetTrainerV2.model_best.tar
-  neonateT1w: trained_model.3d_fullres.Task205_hcp1200_b1000_finetuneround2_dhcp_T1w.nnUNetTrainerV2.model_best.tar
-  T1T2w: trained_model.3d_fullres.Task103_hcp1200_T1T2w.nnUNetTrainerV2.model_best.tar
-  synthseg_v0.1: trained_model.3d_fullres.Task102_synsegGenDetailed.nnUNetTrainerV2.model_best.tar
-  synthseg_v0.2: trained_model.3d_fullres.Task203_synthseg.nnUNetTrainerV2.model_best.tar
-  neonateT1w_v2: trained_model.3d_fullres.Task301_dhcp_T1w_synthseg_manuallycorrected.nnUNetTrainer.model_best.tar
+  T1w: 'zenodo.org/record/4508747/files/trained_model.3d_fullres.Task101_hcp1200_T1w.nnUNetTrainerV2.model_best.tar'
+  T2w: 'zenodo.org/record/4508747/files/trained_model.3d_fullres.Task102_hcp1200_T2w.nnUNetTrainerV2.model_best.tar'
+  hippb500: 'zenodo.org/record/5732291/files/trained_model.3d_fullres.Task110_hcp1200_b1000crop.nnUNetTrainerV2.model_best.tar'
+  neonateT1w: 'zenodo.org/record/5733556/files/trained_model.3d_fullres.Task205_hcp1200_b1000_finetuneround2_dhcp_T1w.nnUNetTrainerV2.model_best.tar'
+  neonateT1w_v2: 'zenodo.org/record/8209029/files/trained_model.3d_fullres.Task301_dhcp_T1w_synthseg_manuallycorrected.nnUNetTrainer.model_best.tar'
+  T1T2w: 'zenodo.org/record/4508747/files/trained_model.3d_fullres.Task103_hcp1200_T1T2w.nnUNetTrainerV2.model_best.tar'
+  synthseg_v0.1: 'zenodo.org/record/8184230/files/trained_model.3d_fullres.Task102_synsegGenDetailed.nnUNetTrainerV2.model_best.tar'
+  synthseg_v0.2: 'zenodo.org/record/8184230/files/trained_model.3d_fullres.Task203_synthseg.nnUNetTrainerV2.model_best.tar'
 
 crop_native_box: '256x256x256vox'
 crop_native_res: '0.2x0.2x0.2mm'
@@ -556,4 +556,5 @@ skip_inject_template_labels: False
 force_nnunet_model: False
 t1_reg_template: False
 generate_myelin_map: False
+no_unfolded_reg: False
 root: results
diff --git a/hippunfold/workflow/rules/nnunet.smk b/hippunfold/workflow/rules/nnunet.smk
@@ -1,8 +1,50 @@
 import re
 from appdirs import AppDirs
+from snakemake.remote.HTTP import RemoteProvider as HTTPRemoteProvider
 
+HTTP = HTTPRemoteProvider()
 
-def get_model_tar(wildcards):
+
+def get_nnunet_input(wildcards):
+    if config["modality"] == "T2w":
+        nii = (
+            bids(
+                root=work,
+                datatype="anat",
+                **config["subj_wildcards"],
+                suffix="T2w.nii.gz",
+                space="corobl",
+                desc="preproc",
+                hemi="{hemi}",
+            ),
+        )
+    elif config["modality"] == "T1w":
+        nii = (
+            bids(
+                root=work,
+                datatype="anat",
+                **config["subj_wildcards"],
+                suffix="T1w.nii.gz",
+                space="corobl",
+                desc="preproc",
+                hemi="{hemi}",
+            ),
+        )
+    elif config["modality"] == "hippb500":
+        nii = bids(
+            root=work,
+            datatype="dwi",
+            hemi="{hemi}",
+            space="corobl",
+            suffix="b500.nii.gz",
+            **config["subj_wildcards"],
+        )
+    else:
+        raise ValueError("modality not supported for nnunet!")
+    return nii
+
+
+def get_model_tar():
 
     if "HIPPUNFOLD_CACHE_DIR" in os.environ.keys():
         download_dir = os.environ["HIPPUNFOLD_CACHE_DIR"]
@@ -20,14 +62,7 @@ def get_model_tar(wildcards):
     if local_tar == None:
         print(f"ERROR: {model_name} does not exist in nnunet_model in the config file")
 
-    dl_path = os.path.abspath(os.path.join(download_dir, local_tar))
-    if os.path.exists(dl_path):
-        return dl_path
-    else:
-        print("ERROR:")
-        print(
-            f"  Cannot find downloaded model at {dl_path}, run this first: hippunfold_download_models"
-        )
+    return os.path.abspath(os.path.join(download_dir, local_tar.split("/")[-1]))
 
 
 def parse_task_from_tar(wildcards, input):
@@ -57,22 +92,23 @@ def parse_trainer_from_tar(wildcards, input):
     return trainer
 
 
+rule download_model:
+    input:
+        HTTP.remote(config["nnunet_model"][config["force_nnunet_model"]])
+        if config["force_nnunet_model"]
+        else HTTP.remote(config["nnunet_model"][config["modality"]]),
+    output:
+        model_tar=get_model_tar(),
+    shell:
+        "cp {input} {output}"
+
+
 rule run_inference:
-    """ This rule REQUIRES a GPU -- will need to modify nnUnet code to create an alternate for CPU-based inference
+    """ This rule uses either GPU or CPU .
     It also runs in an isolated folder (shadow), with symlinks to inputs in that folder, copying over outputs once complete, so temp files are not retained"""
     input:
-        in_img=(
-            bids(
-                root=work,
-                datatype="anat",
-                **config["subj_wildcards"],
-                suffix="{modality}.nii.gz".format(modality=config["modality"]),
-                space="corobl",
-                desc="preproc",
-                hemi="{hemi}",
-            ),
-        ),
-        model_tar=get_model_tar,
+        in_img=get_nnunet_input,
+        model_tar=get_model_tar(),
     params:
         temp_img="tempimg/temp_0000.nii.gz",
         temp_lbl="templbl/temp.nii.gz",

diff --git a/pyproject.toml b/pyproject.toml
@@ -39,7 +39,6 @@ snakefmt = ">=0.5.0"
 
 [tool.poetry.scripts]
 hippunfold = "hippunfold.run:main"
-hippunfold_download_models = "hippunfold.download_models:main"
 
 [build-system]
 requires = ["poetry-core>=1.0.0"]