This repo supplements our 3D Vision with Transformers Survey
Jean Lahoud, Jiale Cao, Fahad Shahbaz Khan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Ming-Hsuan Yang
This repo includes all the 3D computer vision papers with Transformers which are presented in our paper, and we aim to frequently update the latest relevant papers.
- Object Classification
- 3D Object Detection
- 3D Segmentation
- 3D Point Cloud Completion
- 3D Pose Estimation
- Other Tasks
Group-in-Group Relation-Based Transformer for 3D Point Cloud Learning [RS 2022][PDF]
Masked Autoencoders for Point Cloud Self-supervised Learning [ECCV 2022][PDF][Code]
3DCTN: 3D Convolution-Transformer Network for Point Cloud Classification [T-ITS 2022][PDF]
LFT-Net: Local Feature Transformer Network for Point Clouds Analysis [T-ITS 2022][PDF]
Sewer defect detection from 3D point clouds using a transformer-based deep learning model [Automation in Construction 2022][PDF]
3d medical point transformer: Introducing convolution to attention networks for medical point cloud analysis [arXiv 2021][PDF][Code]
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling [CVPR 2022][PDF][Code]
CpT: Convolutional Point Transformer for 3D Point Cloud Processing [ACCVW 2022][PDF]
PatchFormer: An Efficient Point Transformer With Patch Attention [CVPR 2022][PDF]
PVT: Point-Voxel Transformer for Point Cloud Learning [arXiv 2021][PDF][Code]
Adaptive Wavelet Transformer Network for 3D Shape Representation Learning [ICLR 2021][PDF]
Point cloud learning with transformer [arXiv 2021][PDF]
3crossnet: Cross-level cross-scale cross-attention network for point cloud representation [RA-L 2022][PDF]
Dual Transformer for Point Cloud Analysis [IEEE Trans Multimedia][PDF]
Centroid transformers: Learning to abstract with attention [arXiv 2021][PDF]
PCT: Point cloud transformer [CVPR 2019][PDF][Code]
Point Transformer [ICCV 2021][PDF][Code]
Point Transformer [IEEE Access 2021][PDF][Code]
Modeling point clouds with self-attention and gumbel subset sampling [CVPR 2019][PDF]
Attentional shapecontextnet for point cloud recognition [CVPR 2018][PDF][Code]
Bridged Transformer for Vision and Point Cloud 3D Object Detection [CVPR 2022][PDF]
Multimodal Token Fusion for Vision Transformers [CVPR 2022][PDF][Code]
CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object Detection [CVPR 2022][PDF]
Focused Decoding Enables 3D Anatomical Detection by Transformers [arXiv 2022][PDF][Code]
MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection [arXiv 2022][PDF][Code]
TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers [CVPR 2022][PDF][Code]
Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds [CVPR 2022][PDF][Code]
VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention [CVPR 2022][PDF][Code]
Point Density-Aware Voxels for LiDAR 3D Object Detection [CVPR 2022][PDF][Code]
PETR: Position Embedding Transformation for Multi-View 3D Object Detection [ECCV 2022][PDF][Code]
ARM3D: Attention-based relation module for indoor 3D object detection [Comput. Vis.][PDF][Code]
MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer [CVPR 2022][PDF][Code]
Attention-based Proposals Refinement for 3D Object Detection [IV 2022][PDF][Code]
Embracing Single Stride 3D Object Detector with Sparse Transformer [CVPR 2022][PDF][Code]
Fast Point Transformer [CVPR 2022][PDF][Code]
BoxeR: Box-Attention for 2D and 3D Transformers [CVPR 2022][PDF][Code]
DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries [CoRL 2022][PDF][Code]
An End-to-End Transformer Model for 3D Object Detection [ICCV 2021][PDF][Code]
Voxel Transformer for 3D Object Detection [ICCV 2021][PDF][Code]
Improving 3D Object Detection with Channel-wise Transformer [ICCV 2021][PDF][Code]
M3DETR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers [WACV 2022][PDF][Code]
Group-Free 3D Object Detection via Transformers [ICCV 2021][PDF][Code]
SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection [ICCVW 2021][PDF][Code]
3D object detection with pointformer [CVPR 2021][PDF][Code]
Temporal-Channel Transformer for 3D Lidar-Based Video Object Detection in Autonomous Driving [IEEE Trans. Circuits Syst.][PDF]
MLCVNet: Multi-Level Context VoteNet for 3D Object Detection [CVPR 2020][PDF][Code]
LiDAR-based online 3d video object detection with graph-based message passing and spatiotemporal transformer attention [CVPR 2020][PDF][Code]
SCANet: Spatial-channel attention network for 3d object detection [ICASSP 2019][PDF][Code]
For part segmentation, check Object Classification
Stratified Transformer for 3D Point Cloud Segmentation [CVPR 2022][PDF][Code]
Multimodal Token Fusion for Vision Transformers [CVPR 2022][PDF][Code]
Sparse Cross-scale Attention Network for Efficient LiDAR Panoptic Segmentation [AAAI 2022][PDF]
Fast Point Transformer [CVPR 2022][PDF][Code]
Segment-Fusion: Hierarchical Context Fusion for Robust 3D Semantic Segmentation [CVPR 2022][PDF]
Point Spatio-Temporal Transformer Networks for Point Cloud Video Modeling [TPAMI 2022][PDF]
Spatial-Temporal Transformer for 3D Point Cloud Sequences [WACV 2022][PDF]
Point 4D transformer networks for spatio-temporal modeling in point cloud videos [CVPR 2021][PDF][Code]
Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images [MICCAI BrainLes 2022][PDF][Code]
D-Former: A U-shaped Dilated Transformer for 3D Medical Image Segmentation [Neural Comput Appl 2022][PDF]
A Robust Volumetric Transformer for Accurate 3D Tumor Segmentation [MICCAI 2022][PDF][Code]
T-AutoML: Automated Machine Learning for Lesion Segmentation using Transformers in 3D Medical Imaging [ICCV 2021][PDF]
After-unet: Axial fusion transformer unet for medical image segmentation [WACV 2022][PDF]
Bitr-unet: a cnn-transformer combined network for mri brain tumor segmentation [MICCAI BrainLes 2022][PDF]
nnformer: Interleaved transformer for volumetric segmentation [arXiv 2021][PDF][Code]
UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation [MICCAI 2022][PDF][Code]
Medical image segmentation using squeezeand-expansion transformers [IJCAI 2021][PDF][Code]
Unetr: Transformers for 3d medical image segmentation [WACV 2022][PDF][Code]
Transbts: Multimodal brain tumor segmentation using transformer [MICCAI 2021][PDF][Code]
Spectr: Spectral transformer for hyperspectral pathology image segmentation [arXiv 2021][PDF][Code]
Cotr: Efficiently bridging cnn and transformer for 3d medical image segmentation [MICCAI 2021][PDF][Code]
Convolution-free medical image segmentation using transformers [MICCAI 2021][PDF]
Transfuse: Fusing transformers and cnns for medical image segmentation [MICCAI 2021][PDF][Code]
Learning Local Displacements for Point Cloud Completion [CVPR 2022][PDF][Code]
AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation [CVPR 2022][PDF][Code]
PointAttN: You Only Need Attention for Point Cloud Completion [arXiv 2022][PDF][Code]
Point cloud completion on structured feature map with feedback network [CVM 2022][PDF]
ShapeFormer: Transformer-based Shape Completion via Sparse Representation [CVPR 2022][PDF][Code]
A Conditional Point Diffusion-Refinement Paradigm for 3D Point Cloud Completion [ICLR 2021][PDF][Code]
MFM-Net: Unpaired Shape Completion Network with Multi-stage Feature Matching [arXiv 2021][PDF]
PCTMA-Net: Point Cloud Transformer with Morphing Atlas-based Point Generation Network for Dense Point Cloud Completion [IROS 2021][PDF][Code]
PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers [ICCV 2021][PDF][Code]
SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer [ICCV 2021][PDF][Code]
Permutation-Invariant Relational Network for Multi-person 3D Pose Estimation [arXiv 2022][PDF]
Zero-Shot Category-Level Object Pose Estimation [ECCV 2022][PDF][Code]
Efficient Virtual View Selection for 3D Hand Pose Estimation [AAAI 2022][PDF][Code]
Learning-based Point Cloud Registration for 6D Object Pose Estimation in the Real World [ECCV 2022][PDF][Code]
CrossFormer: Cross Spatio-Temporal Transformer for 3D Human Pose Estimation [arXiv 2022][PDF][Code]
RayTran: 3D pose estimation and shape reconstruction of multiple objects from videos with ray-traced transformers [ECCV 2022][PDF]
P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation [ECCV 2022][PDF][Code]
MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video [CVPR 2022][PDF][Code]
6D-ViT: Category-Level 6D Object Pose Estimation via Transformer-based Instance Representation Learning [TIP 2022][PDF]
Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation [CVPR 2022][PDF][Code]
Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation [IEEE Trans. Multimed. 2022][PDF][Code]
3D Human Pose Estimation with Spatial and Temporal Transformers [ICCV 2021][PDF][Code]
End-to-End Human Pose and Mesh Reconstruction with Transformers [CVPR 2021][PDF][Code]
PI-Net: Pose Interacting Network for Multi-Person Monocular 3D Pose Estimation [WACV 2021][PDF][Code]
HOT-Net: Non-Autoregressive Transformer for 3D Hand-Object Pose Estimation [ACM MM 2020][PDF]
Hand-Transformer: Non-Autoregressive Structured Modeling for 3D Hand Pose Estimation [ECCV 2020][PDF]
Epipolar Transformer for Multi-view Human Pose Estimation [CVPRW 2020][PDF][Code]
Pttr: Relational 3d point cloud object tracking with transformer [CVPR 2022][PDF][Code]
3d object tracking with transformer [BMVC 2021][PDF]
Hr-stan: High-resolution spatio-temporal attention network for 3d human motion prediction [CVPRW 2022][PDF]
Gimo: Gaze-informed human motion prediction in context [ECCV 2022][PDF][Code]
Pose transformers (potr): Human motion prediction with non-autoregressive transformer [ICCVW 2021][PDF][Code]
Learning progressive joint propagation for human motion prediction [ECCV 2020][PDF]
History repeats itself: Human motion prediction via motion attention [ECCV 2020][PDF][Code]
A spatio-temporal transformer for 3d human motion prediction [3DV 2021][PDF][Code]
Vpfusion: Joint 3d volume and pixel-aligned feature fusion for single and multi-view 3d reconstruction [arXiv 2022][PDF]
Thundr: Transformer-based 3d human reconstruction with marker [ICCV 2021][PDF]
Multi-view 3d reconstruction with transformer [ICCV 2021][PDF]
Regtr: End-to-end point cloud correspondences with transformer [CVPR 2022][PDF][Code]
LegoFormer: Transformers for Block-by-Block Multi-view 3D Reconstruction [CVPR 2021][PDF][Code]
Robust point cloud registra tion framework based on deep graph matching [CVPR 2021][PDF][Code]
Deep closest point: Learning representations for point cloud registration [ICCV 2019][PDF][Code]
If you find the listing or the survey useful for your work, please cite our paper:
@misc{lahoud20223d,
title={3D Vision with Transformers: A Survey},
author={Lahoud, Jean and Cao, Jiale and Khan, Fahad Shahbaz and Cholakkal, Hisham and Anwer, Rao Muhammad and Khan, Salman and Yang, Ming-Hsuan},
year={2022},
eprint={2208.04309},
archivePrefix={arXiv},
primaryClass={cs.CV}
}