CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP (CVPR 2023)

CLIP2Scene leverages CLIP knowledge to pre-train a 3D point cloud segmentation network via semantic and spatial-temporal consistency regularization. It yields impressive performance on annotation-free 3D semantic segmentation and significantly outperforms other self-supervised methods when fine-tuning on annotated data.

[CVPR 2023 Paper]

Installation

Step 1. Install PyTorch and Torchvision following official instructions,

conda install pytorch==1.10.0 torchvision==0.11.0 cudatoolkit=11.3 -c pytorch -c conda-forge

Step 2. Install Torchsparse and MinkowskiEngine.

# MinkowskiEngine
conda install openblas-devel -c anaconda
git clone https://github.com/NVIDIA/MinkowskiEngine.git
cd MinkowskiEngine
pip install ninja
python setup.py install --blas_include_dirs=${CONDA_PREFIX}/include --blas=openblas

# Torchsparse
# refer to https://github.com/PJLab-ADG/PCSeg/blob/master/docs/INSTALL.md
# Make a directory named `torchsparse_dir`
cd package/
mkdir torchsparse_dir/
#Unzip the `.zip` files in `package/`
unzip sparsehash.zip
unzip torchsparse.zip
mv sparsehash-master/ sparsehash/
cd sparsehash/
./configure --prefix=/${ROOT}/package/torchsparse_dir/sphash/
make
make install
#Compile `torchsparse`
cd ..
pip install ./torchsparse

Step 3. Install CLIP, MaskCLIP, Pytorch_lightning, Nuscenes devkit.

# Install CLIP (https://github.com/openai/CLIP)
pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git
# Install MaskCLIP (https://github.com/chongzhou96/MaskCLIP)
pip install -U openmim
mim install mmcv-full==1.4.0
git clone https://github.com/chongzhou96/MaskCLIP.git
cd MaskCLIP
pip install -v -e .
# Install Pytorch_lightning 
pip install pytorch_lightning==1.4.0
# Install Nuscenes devkit 
pip install torchmetrics==0.4.0
pip install nuscenes-devkit==1.1.9
# Note that we should manually add the following function to the class "LidarPointCloud" 
# in "miniconda3/envs/{your environment name}/lib/python{your python version}/site-packages/nuscenes/utils/data_classes.py"
class LidarPointCloud(PointCloud):
    @classmethod
    def from_points(cls, points) -> 'LidarPointCloud':
        return cls(points.T)

Data Preparation

In this paper, we conduct experiments on ScanNet, Nuscenes, and SemanticKITTI.

Step 1. Download the ScanNet, NuScenes and SemanticKITTI dataset.

# Pre-processing the scannet dataset
python utils/preprocess_scannet.py
# Obtain nuScenes's sweeps information in (https://github.com/open-mmlab/OpenPCDet/blob/master/docs/GETTING_STARTED.md), and
# save as "nuscenes_infos_dict_10sweeps_train.pkl"
python -m pcdet.datasets.nuscenes.nuscenes_dataset --func create_nuscenes_infos \
    --cfg_file tools/cfgs/dataset_configs/nuscenes_dataset.yaml \
    --version v1.0-trainva

Step 2. Download and convert the CLIP models,

python utils/convert_clip_weights.py --model ViT16 --backbone
python utils/convert_clip_weights.py --model ViT16
# obtain ViT16_clip_backbone.pth and ViT16_clip_weights.pth

Step 3. Prepare the CLIP's text embeddings of the scannet and nuscenes datasets,

python utils/prompt_engineering.py --model ViT16 --class-set nuscenes
python utils/prompt_engineering.py --model ViT16 --class-set scannet

Pre-training

ScanNet.

python pretrain.py --cfg_file config/clip2scene_scannet_pretrain.yaml
# The pre-trained model will be saved in /output/clip2scene/scannet/{date}/model.pt

NuScenes.

python pretrain.py --cfg_file config/clip2scene_nuscenes_pretrain.yaml
# The pre-trained model will be saved in /output/clip2scene/nuscenes/{date}/model.pt

Annotation-free

ScanNet.

python downstream.py --cfg_file config/clip2scene_scannet_label_free.yaml --pretraining_path output/clip2scene/scannet/{date}/model.pt

NuScenes.

python downstream.py --cfg_file config/clip2scene_nuscenes_label_free.yaml --pretraining_path output/clip2scene/nuscenes/{date}/model.pt

Fine-tuning on labeled data

ScanNet.

python downstream.py --cfg_file config/clip2scene_scannet_finetune.yaml --pretraining_path output/clip2scene/scannet/{date}/model.pt
# The fine-tuned model will be saved in /output/downstream/scannet/{date}/model.pt

NuScenes.

python downstream.py --cfg_file config/clip2scene_nuscenes_finetune.yaml --pretraining_path output/clip2scene/nuscenes/{date}/model.pt
# The fine-tuned model will be saved in /output/downstream/nuscenes/{date}/model.pt

SemanticKITTI.

python downstream.py --cfg_file config/clip2scene_kitti_finetune.yaml --pretraining_path output/clip2scene/nuscenes/{date}/model.pt
# The fine-tuned model will be saved in /output/downstream/kitti/{date}/model.pt

Citation

If you use CLIP2Scene in your work, please cite

@inproceedings{chen2023clip2scene,
  title={CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP},
  author={Chen, Runnan and Liu, Youquan and Kong, Lingdong and Zhu, Xinge and Ma, Yuexin and Li, Yikang and Hou, Yuenan and Qiao, Yu and Wang, Wenping},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={7020--7030},
  year={2023}
}

Acknowledgement.

Part of the codebase has been adapted from SLidR, MaskCLIP, PCSeg and OpenPCDet.

Contact

For questions about our paper or code, please contact Runnan Chen.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP (CVPR 2023)

Installation

Data Preparation

Pre-training

Annotation-free

Fine-tuning on labeled data

Citation

Contact

Files

README.md

Latest commit

History

README.md

File metadata and controls

CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP (CVPR 2023)

Installation

Data Preparation

Pre-training

Annotation-free

Fine-tuning on labeled data

Citation

Contact