Skip to content

Latest commit

 

History

History
185 lines (159 loc) · 11.6 KB

explanation_plans_files.md

File metadata and controls

185 lines (159 loc) · 11.6 KB

Modifying the nnU-Net Configurations

nnU-Net provides unprecedented out-of-the-box segmentation performance for essentially any dataset we have evaluated it on. That said, there is always room for improvements. A fool-proof strategy for squeezing out the last bit of performance is to start with the default nnU-Net, and then further tune it manually to a concrete dataset at hand. This guide is about changes to the nnU-Net configuration you can make via the plans files. It does not cover code extensions of nnU-Net. For that, take a look here

In nnU-Net V2, plans files are SO MUCH MORE powerful than they were in v1. There are a lot more knobs that you can turn without resorting to hacky solutions or even having to touch the nnU-Net code at all! And as an added bonus: plans files are now also .json files and no longer require users to fiddle with pickle. Just open them in your text editor of choice!

If overwhelmed, look at our Examples!

plans.json structure

Plans have global and local settings. Global settings are applied to all configurations in that plans file while local settings are attached to a specific configuration.

Global settings

  • foreground_intensity_properties_by_modality: Intensity statistics of the foreground regions (all labels except background and ignore label), computed over all training cases. Used by CT normalization scheme.
  • image_reader_writer: Name of the image reader/writer class that should be used with this dataset. You might want to change this if, for example, you would like to run inference with files that have a different file format. The class that is named here must be located in nnunetv2.imageio!
  • label_manager: The name of the class that does label handling. Take a look at nnunetv2.utilities.label_handling.LabelManager to see what it does. If you decide to change it, place your version in nnunetv2.utilities.label_handling!
  • transpose_forward: nnU-Net transposes the input data so that the axes with the highest resolution (lowest spacing) come last. This is because the 2D U-Net operates on the trailing dimensions (more efficient slicing due to internal memory layout of arrays). Future work might move this setting to affect only individual configurations.
  • transpose_backward is what numpy.transpose gets as new axis ordering.
  • transpose_backward: the axis ordering that inverts "transpose_forward"
  • [original_median_shape_after_transp]: just here for your information
  • [original_median_spacing_after_transp]: just here for your information
  • [plans_name]: do not change. Used internally
  • [experiment_planner_used]: just here as metadata so that we know what planner originally generated this file
  • [dataset_name]: do not change. This is the dataset these plans are intended for

Local settings

Plans also have a configurations key in which the actual configurations are stored. configurations are again a dictionary, where the keys are the configuration names and the values are the local settings for each configuration.

To better understand the components describing the network topology in our plans files, please read section 6.2 in the supplementary information (page 13) of our paper!

Local settings:

  • spacing: the target spacing used in this configuration
  • patch_size: the patch size used for training this configuration
  • data_identifier: the preprocessed data for this configuration will be saved in nnUNet_preprocessed/DATASET_NAME/data_identifier. If you add a new configuration, remember to set a unique data_identifier in order to not create conflicts with other configurations (unless you plan to reuse the data from another configuration, for example as is done in the cascade)
  • batch_size: batch size used for training
  • batch_dice: whether to use batch dice (pretend all samples in the batch are one image, compute dice loss over that) or not (each sample in the batch is a separate image, compute dice loss for each sample and average over samples)
  • preprocessor_name: Name of the preprocessor class used for running preprocessing. Class must be located in nnunetv2.preprocessing.preprocessors
  • use_mask_for_norm: whether to use the nonzero mask for normalization or not (relevant for BraTS and the like, probably False for all other datasets). Interacts with ImageNormalization class
  • normalization_schemes: mapping of channel identifier to ImageNormalization class name. ImageNormalization classes must be located in nnunetv2.preprocessing.normalization. Also see here
  • resampling_fn_data: name of resampling function to be used for resizing image data. resampling function must be callable(data, current_spacing, new_spacing, **kwargs). It must be located in nnunetv2.preprocessing.resampling
  • resampling_fn_data_kwargs: kwargs for resampling_fn_data
  • resampling_fn_probabilities: name of resampling function to be used for resizing predicted class probabilities/logits. resampling function must be callable(data: Union[np.ndarray, torch.Tensor], current_spacing, new_spacing, **kwargs). It must be located in nnunetv2.preprocessing.resampling
  • resampling_fn_probabilities_kwargs: kwargs for resampling_fn_probabilities
  • resampling_fn_seg: name of resampling function to be used for resizing segmentation maps (integer: 0, 1, 2, 3, etc). resampling function must be callable(data, current_spacing, new_spacing, **kwargs). It must be located in nnunetv2.preprocessing.resampling
  • resampling_fn_seg_kwargs: kwargs for resampling_fn_seg
  • UNet_class_name: UNet class name, can be used to integrate custom dynamic architectures
  • UNet_base_num_features: The number of starting features for the UNet architecture. Default is 32. Default: Features are doubled with each downsampling
  • unet_max_num_features: Maximum number of features (default: capped at 320 for 3D and 512 for 2d). The purpose is to prevent parameters from exploding too much.
  • conv_kernel_sizes: the convolutional kernel sizes used by nnU-Net in each stage of the encoder. The decoder mirrors the encoder and is therefore not explicitly listed here! The list is as long as n_conv_per_stage_encoder has entries
  • n_conv_per_stage_encoder: number of convolutions used per stage (=at a feature map resolution in the encoder) in the encoder. Default is 2. The list has as many entries as the encoder has stages
  • n_conv_per_stage_decoder: number of convolutions used per stage in the decoder. Also see n_conv_per_stage_encoder
  • num_pool_per_axis: number of times each of the spatial axes is pooled in the network. Needed to know how to pad image sizes during inference (num_pool = 5 means input must be divisible by 2**5=32)
  • pool_op_kernel_sizes: the pooling kernel sizes (and at the same time strides) for each stage of the encoder
  • [median_image_size_in_voxels]: the median size of the images of the training set at the current target spacing. Do not modify this as this is not used. It is just here for your information.

Special local settings:

  • inherits_from: configurations can inherit from each other. This makes it easy to add new configurations that only differ in a few local settings from another. If using this, remember to set a new data_identifier (if needed)!
  • previous_stage: if this configuration is part of a cascade, we need to know what the previous stage (for example the low resolution configuration) was. This needs to be specified here.
  • next_stage: if this configuration is part of a cascade, we need to know what possible subsequent stages are! This is because we need to export predictions in the correct spacing when running the validation. next_stage can either be a string or a list of strings

Examples

Increasing the batch size for large datasets

If your dataset is large the training can benefit from larger batch_sizes. To do this, simply create a new configuration in the configurations dict

"configurations": {
  "3d_fullres_bs40": {
    "inherits_from": "3d_fullres",
    "batch_size": 40
  }
}

No need to change the data_identifier. 3d_fullres_bs40 will just use the preprocessed data from 3d_fullres. No need to rerun nnUNetv2_preprocess because we can use already existing data (if available) from 3d_fullres.

Using custom preprocessors

If you would like to use a different preprocessor class then this can be specified as follows:

"configurations": {
  "3d_fullres_my_preprocesor": {
    "inherits_from": "3d_fullres",
    "preprocessor_name": MY_PREPROCESSOR,
    "data_identifier": "3d_fullres_my_preprocesor"
  }
}

You need to run preprocessing for this new configuration: nnUNetv2_preprocess -d DATASET_ID -c 3d_fullres_my_preprocesor because it changes the preprocessing. Remember to set a unique data_identifier whenever you make modifications to the preprocessed data!

Change target spacing

"configurations": {
  "3d_fullres_my_spacing": {
    "inherits_from": "3d_fullres",
    "spacing": [X, Y, Z],
    "data_identifier": "3d_fullres_my_spacing"
  }
}

You need to run preprocessing for this new configuration: nnUNetv2_preprocess -d DATASET_ID -c 3d_fullres_my_spacing because it changes the preprocessing. Remember to set a unique data_identifier whenever you make modifications to the preprocessed data!

Adding a cascade to a dataset where it does not exist

Hippocampus is small. It doesn't have a cascade. It also doesn't really make sense to add a cascade here but hey for the sake of demonstration we can do that. We change the following things here:

  • spacing: The lowres stage should operate at a lower resolution
  • we modify the median_image_size_in_voxels entry as a guide for what original image sizes we deal with
  • we set some patch size that is inspired by median_image_size_in_voxels
  • we need to remember that the patch size must be divisible by 2**num_pool in each axis!
  • network parameters such as kernel sizes, pooling operations are changed accordingly
  • we need to specify the name of the next stage
  • we need to add the highres stage

This is how this would look like (comparisons with 3d_fullres given as reference):

"configurations": {
  "3d_lowres": {
    "inherits_from": "3d_fullres",
    "data_identifier": "3d_lowres"
    "spacing": [2.0, 2.0, 2.0], # from [1.0, 1.0, 1.0] in 3d_fullres
    "median_image_size_in_voxels": [18, 25, 18], # from [36, 50, 35]
    "patch_size": [20, 28, 20], # from [40, 56, 40]
    "n_conv_per_stage_encoder": [2, 2, 2], # one less entry than 3d_fullres ([2, 2, 2, 2])
    "n_conv_per_stage_decoder": [2, 2], # one less entry than 3d_fullres
    "num_pool_per_axis": [2, 2, 2], # one less pooling than 3d_fullres in each dimension (3d_fullres: [3, 3, 3])
    "pool_op_kernel_sizes": [[1, 1, 1], [2, 2, 2], [2, 2, 2]], # one less [2, 2, 2]
    "conv_kernel_sizes": [[3, 3, 3], [3, 3, 3], [3, 3, 3]], # one less [3, 3, 3]
    "next_stage": "3d_cascade_fullres" # name of the next stage in the cascade
  },
  "3d_cascade_fullres": { # does not need a data_identifier because we can use the data of 3d_fullres
    "inherits_from": "3d_fullres",
    "previous_stage": "3d_lowres" # name of the previous stage
  }
}

To better understand the components describing the network topology in our plans files, please read section 6.2 in the supplementary information (page 13) of our paper!