The evaluation procedure would be executed at ValLoop and TestLoop, users can evaluate model performance during training or using the test script with simple settings in the configuration file. The ValLoop
and TestLoop
are properties of Runner, they will be built the first time they are called. To build the ValLoop
successfully, the val_dataloader
and val_evaluator
must be set when building Runner
since dataloader
and evaluator
are required parameters, and the same goes for TestLoop
. For more information about the Runner's design, please refer to the documentation of MMEngine.
In MMSegmentation, we write the settings of dataloader and metrics in the config files of datasets and the configuration of the evaluation loop in the schedule_x
config files by default.
For example, in the ADE20K config file configs/_base_/dataset/ade20k.py
, on lines 37 to 48, we configured the val_dataloader
, on line 51, we select IoUMetric
as the evaluator and set mIoU
as the metric:
val_dataloader = dict(
batch_size=1,
num_workers=4,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
data_root=data_root,
data_prefix=dict(
img_path='images/validation',
seg_map_path='annotations/validation'),
pipeline=test_pipeline))
val_evaluator = dict(type='IoUMetric', iou_metrics=['mIoU'])
To be able to evaluate the model during training, for example, we add the evaluation configuration to the file configs/schedules/schedule_40k.py
on lines 15 to 16:
train_cfg = dict(type='IterBasedTrainLoop', max_iters=40000, val_interval=4000)
val_cfg = dict(type='ValLoop')
With the above two settings, MMSegmentation evaluates the mIoU metric of the model once every 4000 iterations during the training of 40K iterations.
If we would like to test the model after training, we need to add the test_dataloader
, test_evaluator
and test_cfg
configs to the config file.
test_dataloader = dict(
batch_size=1,
num_workers=4,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
data_root=data_root,
data_prefix=dict(
img_path='images/validation',
seg_map_path='annotations/validation'),
pipeline=test_pipeline))
test_evaluator = dict(type='IoUMetric', iou_metrics=['mIoU'])
test_cfg = dict(type='TestLoop')
In MMSegmentation, the settings of test_dataloader
and test_evaluator
are the same as the ValLoop
's dataloader and evaluator by default, we can modify these settings to meet our needs.
MMSegmentation implements IoUMetric and CityscapesMetric for evaluating the performance of models, based on the BaseMetric provided by MMEngine. Please refer to the documentation for more details about the unified evaluation interface.
Here we briefly describe the arguments and the two main methods of IoUMetric
.
The constructor of IoUMetric
has some additional parameters besides the base collect_device
and prefix
.
The arguments of the constructor:
- ignore_index (int) - Index that will be ignored in evaluation. Default: 255.
- iou_metrics (list[str] | str) - Metrics to be calculated, the options includes 'mIoU', 'mDice' and 'mFscore'.
- nan_to_num (int, optional) - If specified, NaN values will be replaced by the numbers defined by the user. Default: None.
- beta (int) - Determines the weight of recall in the combined score. Default: 1.
- collect_device (str) - Device name used for collecting results from different ranks during distributed training. Must be 'cpu' or 'gpu'. Defaults to 'cpu'.
- prefix (str, optional) - The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If the prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.
IoUMetric
implements the IoU metric calculation, the core two methods of IoUMetric
are process
and compute_metrics
.
process
method processes one batch of data and data_samples.compute_metrics
method computes the metrics from processed results.
Parameters:
- data_batch (Any) - A batch of data from the dataloader.
- data_samples (Sequence[dict]) - A batch of outputs from the model.
Returns:
This method doesn't have returns since the processed results would be stored in self.results
, which will be used to compute the metrics when all batches have been processed.
Parameters:
- results (list) - The processed results of each batch.
Returns:
- Dict[str, float] - The computed metrics. The keys are the names of the metrics, and the values are corresponding results. The key mainly includes aAcc, mIoU, mAcc, mDice, mFscore, mPrecision, mRecall.
CityscapesMetric
uses the official CityscapesScripts provided by Cityscapes to evaluate model performance.
Before using it, please install the cityscapesscripts
package first:
pip install cityscapesscripts
Since the IoUMetric
is used as the default evaluator in MMSegmentation, if you would like to use CityscapesMetric
, customizing the config file is required. In your customized config file, you should overwrite the default evaluator as follows.
val_evaluator = dict(type='CityscapesMetric', output_dir='tmp')
test_evaluator = val_evaluator
The arguments of the constructor:
- output_dir (str) - The directory for output prediction
- ignore_index (int) - Index that will be ignored in evaluation. Default: 255.
- format_only (bool) - Only format result for results commit without perform evaluation. It is useful when you want to format the result to a specific format and submit it to the test server. Defaults to False.
- keep_results (bool) - Whether to keep the results. When
format_only
is True,keep_results
must be True. Defaults to False. - collect_device (str) - Device name used for collecting results from different ranks during distributed training. Must be 'cpu' or 'gpu'. Defaults to 'cpu'.
- prefix (str, optional) - The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.
This method would draw the masks on images and save the painted images to work_dir
.
Parameters:
- data_batch (dict) - A batch of data from the dataloader.
- data_samples (Sequence[dict]) - A batch of outputs from the model.
Returns:
This method doesn't have returns, the annotations' path would be stored in self.results
, which will be used to compute the metrics when all batches have been processed.
This method would call cityscapesscripts.evaluation.evalPixelLevelSemanticLabeling
tool to calculate metrics.
Parameters:
- results (list) - Testing results of the dataset.
Returns:
- dict[str: float] - Cityscapes evaluation results.