EncNet

Context Encoding for Semantic Segmentation

Introduction

Abstract

Recent work has made significant progress in improving spatial resolution for pixelwise labeling with Fully Convolutional Network (FCN) framework by employing Dilated/Atrous convolution, utilizing multi-scale features and refining boundaries. In this paper, we explore the impact of global contextual information in semantic segmentation by introducing the Context Encoding Module, which captures the semantic context of scenes and selectively highlights class-dependent featuremaps. The proposed Context Encoding Module significantly improves semantic segmentation results with only marginal extra computation cost over FCN. Our approach has achieved new state-of-the-art results 51.7% mIoU on PASCAL-Context, 85.9% mIoU on PASCAL VOC 2012. Our single model achieves a final score of 0.5567 on ADE20K test set, which surpass the winning entry of COCO-Place Challenge in 2017. In addition, we also explore how the Context Encoding Module can improve the feature representation of relatively shallow networks for the image classification on CIFAR-10 dataset. Our 14 layer network has achieved an error rate of 3.45%, which is comparable with state-of-the-art approaches with over 10 times more layers. The source code for the complete system are publicly available.

Results and models

Cityscapes

Method	Backbone	Crop Size	Lr schd	Mem (GB)	Inf time (fps)	Device	mIoU	mIoU(ms+flip)	config	download
EncNet	R-50-D8	512x1024	40000	8.6	4.58	V100	75.67	77.08	config	model \| log
EncNet	R-101-D8	512x1024	40000	12.1	2.66	V100	75.81	77.21	config	model \| log
EncNet	R-50-D8	769x769	40000	9.8	1.82	V100	76.24	77.85	config	model \| log
EncNet	R-101-D8	769x769	40000	13.7	1.26	V100	74.25	76.25	config	model \| log
EncNet	R-50-D8	512x1024	80000	-	-	V100	77.94	79.13	config	model \| log
EncNet	R-101-D8	512x1024	80000	-	-	V100	78.55	79.47	config	model \| log
EncNet	R-50-D8	769x769	80000	-	-	V100	77.44	78.72	config	model \| log
EncNet	R-101-D8	769x769	80000	-	-	V100	76.10	76.97	config	model \| log

ADE20K

Method	Backbone	Crop Size	Lr schd	Mem (GB)	Inf time (fps)	Device	mIoU	mIoU(ms+flip)	config	download
EncNet	R-50-D8	512x512	80000	10.1	22.81	V100	39.53	41.17	config	model \| log
EncNet	R-101-D8	512x512	80000	13.6	14.87	V100	42.11	43.61	config	model \| log
EncNet	R-50-D8	512x512	160000	-	-	V100	40.10	41.71	config	model \| log
EncNet	R-101-D8	512x512	160000	-	-	V100	42.61	44.01	config	model \| log

Citation

@InProceedings{Zhang_2018_CVPR,
author = {Zhang, Hang and Dana, Kristin and Shi, Jianping and Zhang, Zhongyue and Wang, Xiaogang and Tyagi, Ambrish and Agrawal, Amit},
title = {Context Encoding for Semantic Segmentation},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

EncNet

Introduction

Abstract

Results and models

Cityscapes

ADE20K

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

EncNet

Introduction

Abstract

Results and models

Cityscapes

ADE20K

Citation