In this repository, we present the datasets and the toolkits of ViP-DeepLab. ViP-DeepLab is a unified model attempting to tackle the long-standing and challenging inverse projection problem in vision, which we model as restoring the point clouds from perspective image sequences while providing each point with instance-level semantic interpretations. Solving this problem requires the vision models to predict the spatial location, semantic class, and temporally consistent instance label for each 3D point. ViP-DeepLab approaches it by jointly performing monocular depth estimation and video panoptic segmentation. We name this joint task as Depth-aware Video Panoptic Segmentation (DVPS), and propose a new evaluation metric along with two derived datasets for it. This repository includes the datasets SemKITTI-DVPS and Cityscapes-DVPS along with the evaluation toolkits.
SemKITTI-DVPS is derived from SemanticKITTI dataset.
SemanticKITTI dataset is based on the odometry
dataset of the KITTI Vision benchmark.
SemanticKITTI dataset provides perspective images and panoptic-labeled 3D point clouds.
To convert it for DVPS, we project the 3D point clouds onto the image plane and name the derived dataset as SemKITTI-DVPS.
SemKITTI-DVPS is distributed under Creative Commons Attribution-NonCommercial-ShareAlike license.
The dataset and the evaluation toolkit are in the folder semkitti-dvps
.
Cityscapes-DVPS is derived from Cityscapes-VPS by adding re-computed depth maps from Cityscapes dataset.
Cityscapes-DVPS is distributed under Creative Commons Attribution-NonCommercial-ShareAlike license.
The dataset and the evaluation toolkit are in the folder cityscapes-dvps
.
If you use the datasets in your research, please cite our project.
@article{vip_deeplab,
title={ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation},
author={Siyuan Qiao and Yukun Zhu and Hartwig Adam and Alan Yuille and Liang-Chieh Chen},
journal={arXiv preprint arXiv:2012.05258},
year={2020}
}