Both Style and Distortion Matter: Dual-Path Unsupervised Domain Adaptation for Panoramic Semantic Segmentation CVPR 2023, [PDF]
In this paper, we studied a new problem by refining the domain gaps between the panoramic and pinhole images into two types: the inherent gap and the format gap. We accordingly proposed DPPASS, the first dual-projection UDA framework, taking ERP and tangent images as input to each path to reduce the domain gaps.
The ability of scene understanding has sparked active research for panoramic image semantic segmentation. However, the performance is hampered by distortion of the equirectangular projection (ERP) and a lack of pixel-wise annotations. For this reason, some works treat the ERP and pinhole images equally and transfer knowledge from the pinhole to ERP images via unsupervised domain adaptation (UDA). However, they fail to handle the domain gaps caused by: 1) the inherent differences between camera sensors and captured scenes; 2) the distinct image formats (e.g., ERP and pinhole images). In this paper, we propose a novel yet flexible dual-path UDA framework, DPPASS, taking ERP and tangent projection (TP) images as inputs. To reduce the domain gaps, we propose cross-projection and intra-projection training. The cross-projection training includes tangent-wise feature contrastive training and prediction consistency training. That is, the former formulates the features with the same projection locations as positive examples and vice versa, for the models' awareness of distortion, while the latter ensures the consistency of cross-model predictions between the ERP and TP. Moreover, adversarial intra-projection training is proposed to reduce the inherent gap, between the features of the pinhole images and those of the ERP and TP images, respectively. Importantly, the TP path can be freely removed after training, leading to no additional inference cost. Extensive experiments on DensePASS and WildPASS datasets show that our DPPASS achieves +1.06% mIoU increment than the state-of-the-art approaches.
[3/14/23] Create this repository!
Environments:
conda create -f DPPASS.yml
The Cityscapes dataset is availabel at Cityscapes
SynPASS dataset contains 9080 panoramic images (1024x2048) and 22 categories.
The scenes include cloudy, foggy, rainy, sunny, and day-/night-time conditions.
The SynPASS dataset is availabel at Trans4PASS
The DensePASS dataset is availabel at Trans4PASS
Data Path:
datasets/
|--- cityscapes
| |___ gtfine
| |___ leftImg8bit
|--- SynPASS
| |--- img
| | |___ cloud
| | |___ fog
| | |___ rain
| | |___ sun
| |--- semantic
| | |___ cloud
| | |___ fog
| | |___ rain
| | |___ sun
|--- DensePASS
| |___ gtfine
| |___ leftImg8bit
We appreciate the previous open-source works.
Bibtex:
@inproceedings{zheng2023both,
title={Both Style and Distortion Matter: Dual-Path Unsupervised Domain Adaptation for Panoramic Semantic Segmentation},
author={Zheng, Xu and Zhu, Jinjing and Liu, Yexin and Cao, Zidong and Fu, Chong and Wang, Lin},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={1285--1295},
year={2023}
}