This is a U-Net model that is designed to perform semantic segmentation. The model has been trained on the CamVid dataset from scratch using PyTorch* framework. Training used median frequency balancing for class weighing. For details about the original floating-point model, check out U-Net: Convolutional Networks for Biomedical Image Segmentation.
The model input is a blob that consists of a single image of 1, 3, 368, 480
in the BGR
order. The pixel values are integers in the [0, 255] range.
The model output for unet-camvid-onnx-0001
is the per-pixel probabilities of each input pixel belonging to one of the 12 classes of the CamVid dataset:
- Sky
- Building
- Pole
- Road
- Pavement
- Tree
- SignSymbol
- Fence
- Vehicle
- Pedestrian
- Bike
- Unlabeled
Metric | Value |
---|---|
GFlops | 260.1 |
MParams | 31.03 |
Source framework | PyTorch* |
The quality metrics were calculated on the CamVid validation dataset. The unlabeled
class had been ignored during metrics calculation.
Metric | Value |
---|---|
mIoU | 71.95% |
IOU=TP/(TP+FN+FP)
, where:TP
- number of true positive pixels for given classFN
- number of false negative pixels for given classFP
- number of false positive pixels for given class
Image, shape - 1, 3, 368, 480
, format is B, C, H, W
, where:
B
- batch sizeC
- channelH
- heightW
- width
Channel order is BGR
Semantic segmentation class probabilities map, shape -1, 12, 368, 480
, output data format is B, C, H, W
, where:
B
- batch sizeC
- predicted probabilities of input pixel belonging to classC
in the [0, 1] rangeH
- horizontal coordinate of the input pixelW
- vertical coordinate of the input pixel
The model can be used in the following demos provided by the Open Model Zoo to show its capabilities:
[*] Other names and brands may be claimed as the property of others.