A MONAI bundle usually includes the stored weights of a model, TorchScript model, JSON files which include configs and metadata about the model, information for constructing training, inference, and post-processing transform sequences, plain-text description, legal information, and other data the model creator wishes to include.
For more information about MONAI bundle, please read the description: https://docs.monai.io/en/latest/bundle_intro.html.
This is a step-by-step tutorial to help get started to develop a bundle package, which contains a config file to construct the training pipeline and also has a metadata.json
file to define the metadata information.
Mainly contains the below sections:
- Define a training config with
JSON
orYAML
format. - Execute training based on bundle scripts and configs.
- Execute other scripts for bundle functionalities.
- Hybrid programming with config and python code.
You can find the usage examples of MONAI bundle key features and syntax in this tutorial, like:
- Instantiate a python object from a dictionary config with
_target_
indicating class or function name or module path. - Execute python expression from a string config with the
$
syntax. - Refer to other python object with the
@
syntax. - Macro text replacement with the
%
syntax to simplify the config content. - Leverage the
_disabled_
syntax to tune or debug different components. - Override config content at runtime.
- Hybrid programming with config and python code.
Download the spleen_ct_segmentation
bundle for this example.
python -m monai.bundle download --name spleen_ct_segmentation --bundle_dir "./"
The dataset for this example comes from http://medicaldecathlon.com/.
Here specify a directory with the MONAI_DATA_DIRECTORY
environment variable to save downloaded dataset and outputs, if no environment, save to the temorary directory.
import os
import tempfile
from monai.apps import download_and_extract
resource = "https://msd-for-monai.s3-us-west-2.amazonaws.com/Task09_Spleen.tar"
md5 = "410d4a301da4e5b2f6f86ec3ddba524e"
directory = os.environ.get("MONAI_DATA_DIRECTORY")
if directory is not None:
os.makedirs(directory, exist_ok=True)
root_dir = tempfile.mkdtemp() if directory is None else directory
compressed_file = os.path.join(root_dir, "Task09_Spleen.tar")
data_dir = os.path.join(root_dir, "Task09_Spleen")
if not os.path.exists(data_dir):
download_and_extract(resource, compressed_file, root_dir, md5)
Now let's start to define the config file for a regular training task. MONAI bundle support both JSON
and YAML
format, here we use JSON
as the example. After downloading the bundle, the train config file in spleen_ct_segmentation/configs/train.json
is available for reference.
According to the predefined syntax of MONAI bundle, $
indicates an expression to evaluate and @
refers to another object in the config content. For more details about the syntax in bundle config, please check: https://docs.monai.io/en/latest/config_syntax.html.
Please note that a MONAI bundle doesn't require any hard-coded logic in the config, so users can define the config content in any structure.
For the first step, import os
and glob
to use in the python expressions
(start with $
), then define input / output environments and enable cudnn.benchmark
for better performance.
The dataset_dir
in the config is the directory of downloaded dataset. Please check the root_dir
and update this accordingly when you are writing your config.
Note that the imports
are only used to execute the python expressions
, and already imported monai
, numpy
, np
, torch
internally as these are minimum dependencies of MONAI.
{
"imports": [
"$import glob",
"$import os",
"$import ignite"
],
"device": "$torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')",
"ckpt_path": "/workspace/data/models/model.pt",
"dataset_dir": "/workspace/data/Task09_Spleen",
"images": "$list(sorted(glob.glob(@dataset_dir + '/imagesTr/*.nii.gz')))",
"labels": "$list(sorted(glob.glob(@dataset_dir + '/labelsTr/*.nii.gz')))"
}
Define UNet
of MONAI as the training network, and use the Adam
optimizer of PyTorch, DiceCELoss
of MONAI.
An instantiable config component uses _target_
keyword to define the class / function name or module path, other keys are args for the component.
Note that for all the MONAI classes and functions, we can use its name in _target_
directly, for any other packages, please provide the full module path
in _target_
.
"network_def": {
"_target_": "UNet",
"spatial_dims": 3,
"in_channels": 1,
"out_channels": 2,
"channels": [16, 32, 64, 128, 256],
"strides": [2, 2, 2, 2],
"num_res_units": 2,
"norm": "batch"
}
Move the network to the expected device which was defined earlier by "device": "$torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')"
.
"network": "$@network_def.to(@device)"
Define optimizer and loss function, for MONAI classes, we can use the class name directly, other classes should provide the module path (like Adam
).
"loss": {
"_target_": "DiceCELoss",
"to_onehot_y": true,
"softmax": true,
"squared_pred": true,
"batch": true
},
"optimizer": {
"_target_": "torch.optim.Adam",
"params": "$@network.parameters()",
"lr": 1e-4
}
Define transforms
and dataset
, dataloader
to generate training data for network.
To make the config structure clear, here we split the train
and validate
related components into 2 sections:
"train": {...},
"validate": {...}
The composed transforms are for preprocessing.
"train": {
"preprocessing": {
"_target_": "Compose",
"transforms": [
{
"_target_": "LoadImaged",
"keys": ["image", "label"]
},
{
"_target_": "EnsureChannelFirstd",
"keys": ["image", "label"]
},
{
"_target_": "Orientationd",
"keys": ["image", "label"],
"axcodes": "RAS"
},
{
"_target_": "Spacingd",
"keys": ["image", "label"],
"pixdim": [1.5, 1.5, 2.0],
"mode": ["bilinear", "nearest"]
},
{
"_target_": "ScaleIntensityRanged",
"keys": "image",
"a_min": -57,
"a_max": 164,
"b_min": 0,
"b_max": 1,
"clip": true
},
{
"_target_": "RandCropByPosNegLabeld",
"keys": ["image", "label"],
"label_key": "label",
"spatial_size": [96, 96, 96],
"pos": 1,
"neg": 1,
"num_samples": 4,
"image_key": "image",
"image_threshold": 0
},
{
"_target_": "EnsureTyped",
"keys": ["image", "label"]
}
]
}
}
The train and validation image file names are organized into a list of dictionaries.
Here we use dataset
instance as 1 argument of dataloader
by the @
syntax, and please note that "#"
in the reference id is interpreted as special characters to go one level further into the nested config structures. For example: "dataset": "@train#dataset"
.
"dataset": {
"_target_": "CacheDataset",
"data": "$[{'image': i, 'label': l} for i, l in zip(@images[:-9], @labels[:-9])]",
"transform": "@train#preprocessing",
"cache_rate": 1.0,
"num_workers": 4
},
"dataloader": {
"_target_": "DataLoader",
"dataset": "@train#dataset",
"batch_size": 2,
"shuffle": false,
"num_workers": 4
}
Here we use SimpleInferer
to execute forward()
computation for the network and add post-processing methods like activation
, argmax
, one-hot
, etc. And logging into stdout and TensorBoard based on event handlers.
"inferer": {
"_target_": "SimpleInferer"
},
"postprocessing": {
"_target_": "Compose",
"transforms": [
{
"_target_": "Activationsd",
"keys": "pred",
"softmax": true
},
{
"_target_": "AsDiscreted",
"keys": ["pred", "label"],
"argmax": [true, false],
"to_onehot": 2
}
]
},
"handlers": [
{
"_target_": "StatsHandler",
"tag_name": "train_loss",
"output_transform": "$monai.handlers.from_engine(['loss'], first=True)"
},
{
"_target_": "TensorBoardStatsHandler",
"log_dir": "eval",
"tag_name": "train_loss",
"output_transform": "$monai.handlers.from_engine(['loss'], first=True)"
}
]
Here we define the Accuracy
metric to compute on training data to help check whether the converge is expected and avoid over-fitting. Note that it's not a validation step during the training.
"key_metric": {
"train_accuracy": {
"_target_": "ignite.metrics.Accuracy",
"output_transform": "$monai.handlers.from_engine(['pred', 'label'])"
}
}
Here we use MONAI engine SupervisedTrainer
to execute a regular training.
If users have customized logic, then can put the logic in the iteration_update
arg or implement their own trainer
in python code and set _target_
to the class directly.
"trainer": {
"_target_": "SupervisedTrainer",
"max_epochs": 100,
"device": "@device",
"train_data_loader": "@train#dataloader",
"network": "@network",
"loss_function": "@loss",
"optimizer": "@optimizer",
"inferer": "@train#inferer",
"postprocessing": "@train#postprocessing",
"key_train_metric": "@train#key_metric",
"train_handlers": "@train#handlers",
"amp": true
}
Usually we need to execute validation for every N epochs during training to verify the model and save the best model.
Here we don't define the validate
section step by step as it's similar to the train
section, please refer to the full training config of the spleen bundle example.
Just show an example of macro text replacement
to simplify the config content and avoid duplicated text. Please note that it's just a token text replacement of the config content, not referring to the instantiated python objects.
"validate": {
"preprocessing": {
"_target_": "Compose",
"transforms": [
"%train#preprocessing#transforms#0",
"%train#preprocessing#transforms#1",
"%train#preprocessing#transforms#2",
"%train#preprocessing#transforms#3",
"%train#preprocessing#transforms#4",
"%train#preprocessing#transforms#6"
]
}
}
We can define a metadata
file in the bundle, which contains the metadata information relating to the model, including what the shape and format of inputs and outputs are, what the meaning of the outputs are, what type of model is present, and other information. The structure is a dictionary containing a defined set of keys with additional user-specified keys.
After downloading the bundle, a typical metadata example in spleen_ct_segmentation/configs/metadata.json
is available for reference.
There are several predefined scripts in the MONAI bundle module, here we leverage the run
script.
We can define the following three sections:
"run" determines the section of the expected config expression to run. "initialize" determines the section of the expected config expression to initialize before running. "finalize" determines the section of the expected config expression to finalize after running.
In this example, only "initialize" and "run" are utilized:
"initialize": [
"$monai.utils.set_determinism(seed=123)",
"$setattr(torch.backends.cudnn, 'benchmark', True)"
],
"run": [
"$@train#trainer.run()"
]
python -m monai.bundle run --config_file configs/train.json
To override some config items at runtime, users can specify the target id
and value
at command line, or override the id
with some content in another config file. Here we set the device to cuda:1
at runtime.
Please note that "#" and "$" may be meaningful syntax for some shell
and CLI
tools, so may need to add escape character or quotes for them in the command line, like: "\$torch.device('cuda:1')"
. For more details: https://github.com/google/python-fire/blob/v0.4.0/fire/parser.py#L60.
python -m monai.bundle run --config_file configs/train.json --device "\$torch.device('cuda:1')"
Override content from another config file.
python -m monai.bundle run --config_file configs/train.json --network "%configs/inference.json#network"
Besides run
, there are also many other scripts for bundle functionalities. All the scripts are available at: https://docs.monai.io/en/latest/bundle.html#scripts.
Here are some typical examples:
- Initialize a bundle directory based on the template and pretrained checkpoint weights.
python -m monai.bundle init_bundle --bundle_dir <target dir> --ckpt_file <checkpoint path>
- Export the model checkpoint to a
TorchScript
model at the given filepath with metadata and config included as JSON files.
python -m monai.bundle ckpt_export network --filepath <export path> --ckpt_file <checkpoint path> --config_file <config path>
- Verify the format of provided
metadata
file based on the predefinedschema
.
python -m monai.bundle verify_metadata --meta_file <meta path>
- Verify the input and output data shape and data type of the network defined in the metadata. It will test with fake Tensor data according to the required data shape in
metadata
.
python -m monai.bundle verify_net_in_out network --meta_file <metadata path> --config_file <config path>
The acceptable data shape in the metadata can support "*"
for any size, or use an expression with Python mathematical operators and one character variables to represent dependence on an unknown quantity, for example, "2**p"
represents a size which must be a power of 2, "2**p*n"
must be a multiple of a power of 2. "spatial_shape": [ "32 * n", "2 ** p * n", "*"]
.
A MONAI bundle supports flexible customized logic, there are several ways to achieve this:
- If defining own components like transform, loss, trainer, etc. in a python file, just use its module path in
_target_
within the config file. - Parse the config in your own python program and do lazy instantiation with customized logic.
Here we show an example to parse the config in python code and execute the training.
from monai.bundle import ConfigParser
parser = ConfigParser()
parser.read_config(f="configs/train.json")
parser.read_meta(f="configs/metadata.json")
get
/set
configuration content, the set
method should happen before calling parse()
.
# original input channels 1
print(parser["network_def"]["in_channels"])
# change input channels to 4
parser["network_def"]["in_channels"] = 4
print(parser["network_def"]["in_channels"])
Parse the config content and instantiate components.
# parse the structured config content
parser.parse()
# instantiate the network component and print the network structure
net = parser.get_parsed_content("network")
print(net)
# execute training
trainer = parser.get_parsed_content("train#trainer")
trainer.run()