This repository contains the official PyTorch implementation of the following CVPR 2023 paper:
Title: CleanerS: Semantic Scene Completion with Cleaner Self PDF
Author: Fengyun Wang, Dong Zhang, Hanwang Zhang, Jinhui Tang, Qianru Sun
Affiliation: NJUST, HKUST, NTU, SMU
Semantic Scene Completion (SSC) transforms an image of single-view depth and/or RGB 2D pixels into 3D voxels, each of whose semantic labels are predicted. SSC is a well-known ill-posed problem as the prediction model has to "imagine" what is behind the visible surface, which is usually represented by Truncated Signed Distance Function (TSDF). Due to the sensory imperfection of the depth camera, most existing methods based on the noisy TSDF estimated from depth values suffer from 1) incomplete volumetric predictions and 2) confused semantic labels. To this end, we use the ground-truth 3D voxels to generate a perfect visible surface, called TSDF-CAD, and then train a "cleaner" SSC model. As the model is noise-free, it is expected to focus more on the "imagination" of unseen voxels. Then, we propose to distill the intermediate "cleaner" knowledge into another model with noisy TSDF input. In particular, we use the 3D occupancy feature and the semantic relations of the "cleaner self" to supervise the counterparts of the "noisy self" to respectively address the above two incorrect predictions. Experimental results validate that our method improves the noisy counterparts with 3.1% IoU and 2.2% mIoU for measuring scene completion and SSC, and also achieves new state-of-the-art accuracy on the popular NYU dataset.
CleanerS mainly soncists of two networks: a teacher network, and a student network. These two networks share same architectures but have different weights. The distillation pipelines include a feature-based cleaner surface distillation (i.e., KD-T), and logit-based cleaner semantic distillations (i.e., KD-SC and KD-SA).
Segformer-B2 | Model Zoo | Visual Results |
---|---|---|
Teacher Model | Google Drive / Baidu Netdisk with code:3gew | Google Drive / Baidu Netdisk with code:p9nl |
Student Model | Google Drive / Baidu Netdisk with code:6eja | Google Drive / Baidu Netdisk with code:lktg |
- Pytorch 1.10.1
- cudatoolkit 11.1
- mmcv 1.5.0
- mmsegmentation 0.27.0
conda create -n CleanerS python=3.7 -y
conda activate CleanerS
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.1 -c pytorch -c conda-forge
pip install mmcv-full==1.5.0 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.10/index.html
pip install mmsegmentation==0.27.0
conda install scikit-learn
pip install pyyaml timm tqdm EasyConfig multimethod easydict termcolor shortuuid imageio
We follow the project of 3D-Sketch for dataset preparing.
After preparing, your_SSC_Dataset
folder should look like:
-- your_SSC_Dataset
| NYU
|-- TSDF
|-- Mapping
| |-- trainset
| |-- |-- RGB
| |-- |-- depth
| |-- |-- GT
| |-- testset
| |-- |-- RGB
| |-- |-- depth
| |-- |-- GT
| NYUCAD
|-- TSDF
| |-- trainset
| |-- |-- depth
| |-- testset
| |-- |-- depth
- on Segformer-B2
- Download the pretrained Segformer-B2, mit_b2.pth;
- (optional) Download the teacher model and put it into
./teacher/Teacher_ckpt.pth
; - Run
run.sh
for training the CleanerS (if you skip the step 2, it will train both teacher and student models).
- on ResNet50
- Download the pretrained ResNet50.
- Download our weights and then put it in the
./checkpoint
folder. - Run
python test_NYU.py --pretrained_path ./checkpoint/CleanerS_ckpt.pth
. The visualized results will be in the./visual_pred/CleanerS
folder. - (optional) Run
python test_NYU.py --pretrained_path ./checkpoint/Teacher_ckpt.pth
to get the results of the teacher model.
If this work is helpful for your research, please consider citing:
@inproceedings{wang2023semantic,
title={Semantic scene completion with cleaner self},
author={Wang, Fengyun and Zhang, Dong and Zhang, Hanwang and Tang, Jinhui and Sun, Qianru},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={867--877},
year={2023}
}
- switchable 2DNet for both Segformer-B2 and ResNet50
This code is based on 3D-Sketch.