Skip to content

Commit

Permalink
Merge pull request #247 from khanlab/download-models-v2
Browse files Browse the repository at this point in the history
Download models inside workflow, update docs accordingly
  • Loading branch information
akhanf authored Aug 3, 2023
2 parents dbf047f + f805568 commit 994103d
Show file tree
Hide file tree
Showing 10 changed files with 111 additions and 43 deletions.
1 change: 0 additions & 1 deletion .github/workflows/push_container.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,6 @@ jobs:
ghcr.io/${{ github.repository }}
flavor: |
latest=auto
suffix=_synthseg
- name: Build and push Docker images
uses: docker/build-push-action@ad44023a93711e3deb337508980b4b5e9bcdc5dc
Expand Down
6 changes: 3 additions & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@ MAINTAINER alik@robarts.ca

COPY . /src/

#pre-download the models here:
ENV HIPPUNFOLD_CACHE_DIR=/opt/hippunfold_cache
# avoid pre-downloading the models to make for lighter container
# ENV HIPPUNFOLD_CACHE_DIR=/opt/hippunfold_cache

#install hippunfold and imagemagick (for reports)
RUN pip install /src && hippunfold_download_models && \
RUN pip install --no-cache-dir /src && \
apt install -y graphviz && \
wget https://imagemagick.org/archive/binaries/magick && \
mv magick /usr/bin && chmod a+x /usr/bin/magick
Expand Down
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,16 @@ This is especially useful for:

## NEW: Version 1.3.0 release

Major changes include the addition of unfolded space registration to a reference atlas harmonized across seven ground-truth histology samples. This method allows shifting in unfolded space, providing even better intersubject alignment.
Major changes include the addition of unfolded space registration to a reference atlas harmonized across seven ground-truth histology samples. This method allows shifting in unfolded space, providing even better intersubject alignment.

*Note: this replaces the default workflow, however you can revert to the legacy workflow, disabling unfolded space registration, by setting `--atlas bigbrain` or `--no-unfolded-reg`*

Read more in our [preprinted manuscript](https://www.biorxiv.org/content/10.1101/2023.03.30.534978v1)

Also the ability to specify a new **experimental** UNet model that is contrast-agnostic using [synthseg](https://github.com/BBillot/SynthSeg) and trained using more detailed segmentations. This generally produces more detailed results but has not been extensively tested yet.



## Workflow

The overall workflow can be summarized in the following steps:
Expand Down
19 changes: 17 additions & 2 deletions docs/contributing/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,7 @@ trained models. For Khan lab's members, the following line must be add to the ba

export HIPPUNFOLD_CACHE_DIR="/project/6050199/akhanf/opt/hippunfold_trained_models"


Note: make sure to reload your bash profile if needed (`source ~./bash_profile`).

5. For an easier execution in Graham, it's recommended to also install
Expand Down Expand Up @@ -185,8 +186,10 @@ If poetry is not installed, please refer to the [installation documentation](htt

The trained model files we use for hippunfold are large and thus are not
included directly in this github repository, and instead are downloaded
from Zenodo releases. If you are using the docker/singularity
container, `docker://khanlab/hippunfold`, they are pre-downloaded there, in `/opt/hippunfold_cache`.
from Zenodo releases.

### For HippUnfold versions earlier than 1.3.0 (< 1.3.0):
If you are using the docker/singularity container, `docker://khanlab/hippunfold`, they are pre-downloaded there, in `/opt/hippunfold_cache`.

If you are not using this container, you will need to download the models before running hippunfold, by running:

Expand All @@ -196,6 +199,18 @@ This console script (installed when you install hippunfold) downloads all the mo
which on Linux is typically `~/.cache/hippunfold`. To override this, you can set the `HIPPUNFOLD_CACHE_DIR` environment
variable before running `hippunfold_download_models` and `hippunfold`.

### NEW: For HippUnfold versions 1.3.0 and later (>= 1.3.0):
With the addition of new models, including all models in the container was not feasible and a change was made to
**not include** any models in the docker/singularity containers. In these versions, the `hippunfold_download_models` command
is removed, and any models will simply be downloaded as part of the workflow. As before, all models will be stored in the system cache dir,
which is typically `~/.cache/hippunfold`, and to override this can set the `HIPPUNFOLD_CACHE_DIR` environment variable before running `hippunfold`.

If you want to pre-download a model (e.g. if your compute nodes do not have internet access), you can run simply run `download_model` rule in HippUnfold e.g.:

```
hippunfold BIDS_DIR OUTPUT_DIR PARTICIPANT_LEVEL --modality T1w --until download_model -c 1
```


## Overriding Singularity cache directories

Expand Down
2 changes: 1 addition & 1 deletion docs/getting_started/docker.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ HippUnfold, and can be listed with `--help-snakemake`:

## Running an example

Download and extract a single-subject BIDS dataset for this test from https://www.dropbox.com/s/mdbmpmmq6fi8sk0/hippunfold_test_data.tar. Here we will also assume you chose to save and extract to the directory `c:\Users\jordan\Downloads\`.
Download and extract a single-subject BIDS dataset for this test from [hippunfold_test_data.tar](https://www.dropbox.com/s/mdbmpmmq6fi8sk0/hippunfold_test_data.tar). Here we will also assume you chose to save and extract to the directory `c:\Users\jordan\Downloads\`.

This contains a `ds002168/` directory with a single subject, that has a both T1w and T2w images.

Expand Down
7 changes: 4 additions & 3 deletions docs/getting_started/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ The HippUnfold BIDS App is available on a DockerHub as versioned releases and de

#### Pros:
- Compatible with non-Linux systems
- All dependencies+models in a single container
- All dependencies+models (* See Note 1) in a single container

#### Cons:
- Typically not possible on shared machines
Expand All @@ -59,7 +59,7 @@ The HippUnfold BIDS App is available on a DockerHub as versioned releases and de
The same docker container can also be used with Singularity (now Apptainer). Instructions can be found below.

#### Pros:
- All dependencies+models in a single container
- All dependencies+models (* See Note 1) in a single container
- Container stored as a single file (.sif)

#### Cons:
Expand All @@ -80,5 +80,6 @@ Instructions for this can be found in the **Contributing** documentation page.
- Must use Python virtual environment
- Only compatible on Linux systems with Singularity for external dependencies


## Note 1:
As of version 1.3.0 of HippUnfold, containers are no longer shipped with all the models, and the models are downloaded as part of the workflow. By default, models are placed in `~/.cache/hippunfold` unless you set the `HIPPUNFOLD_CACHE_DIR` environment variable. See [Deep learning nnU-net model files](https://hippunfold.readthedocs.io/en/latest/contributing/contributing.html#deep-learning-nnu-net-model-files) for more information.

15 changes: 14 additions & 1 deletion docs/usage/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

1. [](run-inference-mem)
2. [](no-input-images)
3. [](container-size)
4. [](model-files)


(run-inference-mem)=
Expand Down Expand Up @@ -41,6 +43,17 @@ This can happen if:
- Singularity or docker cannot access your input directory. For Singularity, ensure your [Singularity options](https://docs.sylabs.io/guides/3.1/user-guide/cli/singularity_run.html) are appropriate, in particular `SINGULARITY_BINDPATH`. For docker, ensure you are mounting the correct directory with the `-v` flag described in the [Getting started](https://hippunfold.readthedocs.io/en/latest/getting_started/docker.html) section.
- HippUnfold does not recognize your BIDS-formatted input images. This can occur if, for example, T1w images are labelled with the suffix `_t1w.nii.gz` instead of `_T1w.nii.gz` as per [BIDS specifications](https://bids.neuroimaging.io/specification.html). HippUnfold makes use of [PyBIDS](https://github.com/bids-standard/pybids) to parse the dataset, so we suggest you use the [BIDS Validator](https://bids-standard.github.io/bids-validator/) to ensure your dataset has no errors. Note: You can override BIDS parsing and use custom filenames with the `--path-*` option as described in the [](../usage/useful_options.md#parsing-non-bids-datasets-with-custom-paths) section.


(container-size)=
## Why is the HippUnfold Docker/Singularity/Apptainer container so large?

In addition to some large software dependencies, the container has historically included U-net models for all the possible modalities we trained, each model taking up 2-4GB.
We have addressed this issue in versions >= 1.3.0, by updating the workflow to download models on the fly (when they have not been previously downloaded), and not including any
models in the container itself. This drops the container size significantly (<4GB compressed).

(model-files)=
## Why do I end up with large files in `~/.cache/hippunfold` after running HippUnfold?

This folder is where the nnU-net model parameters are stored by default. You can override the location with the `HIPPUNFOLD_CACHE_DIR` environment variable. See [](../contributing/contributing.md#deep-learning-nnu-net-model-files) for more details.



17 changes: 9 additions & 8 deletions hippunfold/config/snakebids.yml
Original file line number Diff line number Diff line change
Expand Up @@ -415,14 +415,14 @@ modality: T2w

#these will be downloaded to ~/.cache/hippunfold
nnunet_model:
T1w: trained_model.3d_fullres.Task101_hcp1200_T1w.nnUNetTrainerV2.model_best.tar
T2w: trained_model.3d_fullres.Task102_hcp1200_T2w.nnUNetTrainerV2.model_best.tar
hippb500: trained_model.3d_fullres.Task110_hcp1200_b1000crop.nnUNetTrainerV2.model_best.tar
neonateT1w: trained_model.3d_fullres.Task205_hcp1200_b1000_finetuneround2_dhcp_T1w.nnUNetTrainerV2.model_best.tar
T1T2w: trained_model.3d_fullres.Task103_hcp1200_T1T2w.nnUNetTrainerV2.model_best.tar
synthseg_v0.1: trained_model.3d_fullres.Task102_synsegGenDetailed.nnUNetTrainerV2.model_best.tar
synthseg_v0.2: trained_model.3d_fullres.Task203_synthseg.nnUNetTrainerV2.model_best.tar
neonateT1w_v2: trained_model.3d_fullres.Task301_dhcp_T1w_synthseg_manuallycorrected.nnUNetTrainer.model_best.tar
T1w: 'zenodo.org/record/4508747/files/trained_model.3d_fullres.Task101_hcp1200_T1w.nnUNetTrainerV2.model_best.tar'
T2w: 'zenodo.org/record/4508747/files/trained_model.3d_fullres.Task102_hcp1200_T2w.nnUNetTrainerV2.model_best.tar'
hippb500: 'zenodo.org/record/5732291/files/trained_model.3d_fullres.Task110_hcp1200_b1000crop.nnUNetTrainerV2.model_best.tar'
neonateT1w: 'zenodo.org/record/5733556/files/trained_model.3d_fullres.Task205_hcp1200_b1000_finetuneround2_dhcp_T1w.nnUNetTrainerV2.model_best.tar'
neonateT1w_v2: 'zenodo.org/record/8209029/files/trained_model.3d_fullres.Task301_dhcp_T1w_synthseg_manuallycorrected.nnUNetTrainer.model_best.tar'
T1T2w: 'zenodo.org/record/4508747/files/trained_model.3d_fullres.Task103_hcp1200_T1T2w.nnUNetTrainerV2.model_best.tar'
synthseg_v0.1: 'zenodo.org/record/8184230/files/trained_model.3d_fullres.Task102_synsegGenDetailed.nnUNetTrainerV2.model_best.tar'
synthseg_v0.2: 'zenodo.org/record/8184230/files/trained_model.3d_fullres.Task203_synthseg.nnUNetTrainerV2.model_best.tar'

crop_native_box: '256x256x256vox'
crop_native_res: '0.2x0.2x0.2mm'
Expand Down Expand Up @@ -556,4 +556,5 @@ skip_inject_template_labels: False
force_nnunet_model: False
t1_reg_template: False
generate_myelin_map: False
no_unfolded_reg: False
root: results
80 changes: 58 additions & 22 deletions hippunfold/workflow/rules/nnunet.smk
Original file line number Diff line number Diff line change
@@ -1,8 +1,50 @@
import re
from appdirs import AppDirs
from snakemake.remote.HTTP import RemoteProvider as HTTPRemoteProvider

HTTP = HTTPRemoteProvider()

def get_model_tar(wildcards):

def get_nnunet_input(wildcards):
if config["modality"] == "T2w":
nii = (
bids(
root=work,
datatype="anat",
**config["subj_wildcards"],
suffix="T2w.nii.gz",
space="corobl",
desc="preproc",
hemi="{hemi}",
),
)
elif config["modality"] == "T1w":
nii = (
bids(
root=work,
datatype="anat",
**config["subj_wildcards"],
suffix="T1w.nii.gz",
space="corobl",
desc="preproc",
hemi="{hemi}",
),
)
elif config["modality"] == "hippb500":
nii = bids(
root=work,
datatype="dwi",
hemi="{hemi}",
space="corobl",
suffix="b500.nii.gz",
**config["subj_wildcards"],
)
else:
raise ValueError("modality not supported for nnunet!")
return nii


def get_model_tar():

if "HIPPUNFOLD_CACHE_DIR" in os.environ.keys():
download_dir = os.environ["HIPPUNFOLD_CACHE_DIR"]
Expand All @@ -20,14 +62,7 @@ def get_model_tar(wildcards):
if local_tar == None:
print(f"ERROR: {model_name} does not exist in nnunet_model in the config file")

dl_path = os.path.abspath(os.path.join(download_dir, local_tar))
if os.path.exists(dl_path):
return dl_path
else:
print("ERROR:")
print(
f" Cannot find downloaded model at {dl_path}, run this first: hippunfold_download_models"
)
return os.path.abspath(os.path.join(download_dir, local_tar.split("/")[-1]))


def parse_task_from_tar(wildcards, input):
Expand Down Expand Up @@ -57,22 +92,23 @@ def parse_trainer_from_tar(wildcards, input):
return trainer


rule download_model:
input:
HTTP.remote(config["nnunet_model"][config["force_nnunet_model"]])
if config["force_nnunet_model"]
else HTTP.remote(config["nnunet_model"][config["modality"]]),
output:
model_tar=get_model_tar(),
shell:
"cp {input} {output}"


rule run_inference:
""" This rule REQUIRES a GPU -- will need to modify nnUnet code to create an alternate for CPU-based inference
""" This rule uses either GPU or CPU .
It also runs in an isolated folder (shadow), with symlinks to inputs in that folder, copying over outputs once complete, so temp files are not retained"""
input:
in_img=(
bids(
root=work,
datatype="anat",
**config["subj_wildcards"],
suffix="{modality}.nii.gz".format(modality=config["modality"]),
space="corobl",
desc="preproc",
hemi="{hemi}",
),
),
model_tar=get_model_tar,
in_img=get_nnunet_input,
model_tar=get_model_tar(),
params:
temp_img="tempimg/temp_0000.nii.gz",
temp_lbl="templbl/temp.nii.gz",
Expand Down
1 change: 0 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,6 @@ snakefmt = ">=0.5.0"

[tool.poetry.scripts]
hippunfold = "hippunfold.run:main"
hippunfold_download_models = "hippunfold.download_models:main"

[build-system]
requires = ["poetry-core>=1.0.0"]
Expand Down

0 comments on commit 994103d

Please sign in to comment.