Masked Surfel Prediction for Self-Supervised Point Cloud Learning, arxiv
Masked auto-encoding is a popular and effective self-supervised learning approach to point cloud learning. However, most of the existing methods reconstruct only the masked points and overlook the local geometry information, which is also important to understand the point cloud data. In this work, we make the first attempt, to the best of our knowledge, to consider the local geometry information explicitly into the masked auto-encoding, and propose a novel Masked Surfel Prediction (MaskSurf) method. Specifically, given the input point cloud masked at a high ratio, we learn a transformer-based encoder-decoder network to estimate the underlying masked surfels by simultaneously predicting the surfel positions (i.e., points) and per-surfel orientations (i.e., normals). The predictions of points and normals are supervised by the Chamfer Distance and a newly introduced Position-Indexed Normal Distance in a set-to-set manner. Our MaskSurf is validated on six downstream tasks under three fine-tuning strategies. In particular, MaskSurf outperforms its closest competitor, Point-MAE, by 1.2% on the real-world dataset of ScanObjectNN under the OBJ-BG setting, justifying the advantages of masked surfel prediction over masked point cloud reconstruction.
Fig.1: The overall framework of MaskSurf. |
PyTorch >= 1.7.0; python >= 3.7; CUDA >= 9.0; GCC >= 4.9; torchvision;
pip install -r requirements.txt
# Chamfer Distance & emd
cd ./extensions/chamfer_dist
python setup.py install --user
cd ./extensions/emd
python setup.py install --user
# PointNet++
pip install "git+https://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"
# GPU kNN
pip install --upgrade https://github.com/unlimblue/KNN_CUDA/releases/download/0.2/KNN_CUDA-0.2-py3-none-any.whl
We use ShapeNet, ScanObjectNN, ModelNet40, ShapeNetPart and S3DIS in this work. See DATASET.md for details.
The results of following pretrained models are slightly different from that reported in the paper due to the randomness of results. We report the standard deviation of results in the paper to illustrate such performance fluctuation.
Task | Dataset | Config | Acc. | Download |
---|---|---|---|---|
Pre-training | ShapeNet | pretrain_MaskSurf.yaml | N.A. | here |
Classification | ScanObjectNN | finetune_scan_hardest_transferring_features.yaml | 85.67% | here |
Classification | ScanObjectNN | finetune_scan_objbg_transferring_features.yaml | 91.05% | here |
Classification | ScanObjectNN | finetune_scan_objonly_transferring_features.yaml | 89.32% | here |
Classification | ModelNet40 | finetune_modelnet_transferring_features.yaml | 93.56% | here |
Classification | ShapeNet | finetune_shapenet_non_linear_classification.yaml | 91.10% | here |
Part segmentation | ShapeNetPart | segmentation | 86.12% mIoU | here |
Semantic segmentation | ShapeNetPart | semantic_segmentation | 88.3% OA | here |
Task | Dataset | Config | 5w10s Acc. (%) | 5w20s Acc. (%) | 10w10s Acc. (%) | 10w20s Acc. (%) |
---|---|---|---|---|---|---|
Few-shot learning | ScanObjectNN | fewshot_scanobjectnn_transferring_features.yaml | 65.3 ± 4.9 | 77.4 ± 5.2 | 53.8 ± 5.3 | 63.2 ± 2.7 |
We provide all the scripts for pre-training and fine-tuning in the run.sh.
Additionally, we provide a simple tool to collect the mean and standard deviation of results, for example: python parse_test_res.py ./experiments/{experiments_settting}/cfgs/ --multi-exp
To pretrain MaskSurf on ShapeNet training set, run the following command. If you want to try different models or masking ratios etc., first create a new config file, and pass its path to --config.
CUDA_VISIBLE_DEVICES=<GPU> python main.py --config cfgs/pretrain_MaskSurf.yaml --exp_name <output_file_name>
Fine-tuning on ScanObjectNN, run:
CUDA_VISIBLE_DEVICES=<GPUs> python main.py --config cfgs/finetune_scan_hardest_{protocol}.yaml \
--finetune_model --exp_name <output_file_name> --ckpts <path/to/pre-trained/model>
Fine-tuning on ModelNet40, run:
CUDA_VISIBLE_DEVICES=<GPUs> python main.py --config cfgs/finetune_modelnet_{protocol}.yaml \
--finetune_model --exp_name <output_file_name> --ckpts <path/to/pre-trained/model>
Voting on ModelNet40, run:
CUDA_VISIBLE_DEVICES=<GPUs> python main.py --test --config cfgs/finetune_modelnet_{protocol}.yaml \
--exp_name <output_file_name> --ckpts <path/to/best/fine-tuned/model>
Few-shot learning on ModelNet40 or ScanObjectNN, run:
CUDA_VISIBLE_DEVICES=<GPUs> python main.py --config cfgs/fewshot_{dataset}_{protocol}.yaml --finetune_model \
--ckpts <path/to/pre-trained/model> --exp_name <output_file_name> --way <5 or 10> --shot <10 or 20> --fold <0-9>
Domain generalization, run:
CUDA_VISIBLE_DEVICES=<GPUs> python main.py --config cfgs/dg_{source}_{protocol}.yaml --finetune_model --exp_name <output_file_name> --ckpts <path/to/best/fine-tuned/model>
CUDA_VISIBLE_DEVICES=<GPUs> python main.py --config cfgs/dg_{source}2scannet_{protocol}.yaml --test --finetune_model --exp_name <output_file_name> --ckpts <./experiments/dg_{source}_{protocol}.yaml/cfgs/<path/to/best/fine-tuned/model>
Part segmentation on ShapeNetPart, run:
cd segmentation
CUDA_VISIBLE_DEVICES=<GPUs> python main.py --ckpts <path/to/pre-trained/model> --root path/to/data --learning_rate 0.0002 --epoch 300
Semantic segmentation on S3DIS, run:
cd segmentation
CUDA_VISIBLE_DEVICES=<GPUs> python main.py --optimizer_part all --ckpts <path/to/pre-trained/model> --root path/to/data --learning_rate 0.0002
CUDA_VISIBLE_DEVICES=<GPUs> python main_test.py --root path/to/data --visual --ckpts <path/to/best/fine-tuned/model>
Please refer to the vis_masksurf.py for the visualization of surfels.
Our codes are built upon Point-MAE
@article{zhang2022masked,
title={Masked Surfel Prediction for Self-Supervised Point Cloud Learning},
author={Zhang, Yabin and Lin, Jiehong and He, Chenhang and Chen, Yongwei and Jia, Kui and Zhang, Lei},
journal={arXiv preprint arXiv:2207.03111},
year={2022}
}