From b481efcc3bd38991d2f71d3b4e5ab2c83355071b Mon Sep 17 00:00:00 2001 From: Sun Jiahao <72679458+sunjiahao1999@users.noreply.github.com> Date: Wed, 29 Mar 2023 16:37:32 +0800 Subject: [PATCH] [Docs] Add docs and README for MinkUnet (#2358) * add readme * rename * fix miou typo * add link * fix backbone name * add torchsparse link * revise link --- README.md | 55 +++++++++++++++++---------------- README_zh-CN.md | 55 +++++++++++++++++---------------- configs/minkunet/README.md | 43 ++++++++++++++++++++++++++ configs/minkunet/metafile.yml | 57 +++++++++++++++++++++++++++++++++++ model-index.yml | 1 + 5 files changed, 159 insertions(+), 52 deletions(-) create mode 100644 configs/minkunet/README.md create mode 100644 configs/minkunet/metafile.yml diff --git a/README.md b/README.md index 4817af6b07..a963bc0209 100644 --- a/README.md +++ b/README.md @@ -134,6 +134,7 @@ Results and models are available in the [model zoo](docs/en/model_zoo.md).
  • DGCNN (TOG'2019)
  • DLA (CVPR'2018)
  • MinkResNet (CVPR'2019)
  • +
  • MinkUNet (CVPR'2019)
  • Cylinder3D (CVPR'2021)
  • @@ -221,6 +222,7 @@ Results and models are available in the [model zoo](docs/en/model_zoo.md).
  • Outdoor
  • Indoor
  • @@ -237,32 +239,33 @@ Results and models are available in the [model zoo](docs/en/model_zoo.md). -| | ResNet | PointNet++ | SECOND | DGCNN | RegNetX | DLA | MinkResNet | Cylinder3D | -| :-----------: | :----: | :--------: | :----: | :---: | :-----: | :-: | :--------: | :--------: | -| SECOND | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | -| PointPillars | ✗ | ✗ | ✓ | ✗ | ✓ | ✗ | ✗ | ✗ | -| FreeAnchor | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | -| VoteNet | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | -| H3DNet | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | -| 3DSSD | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | -| Part-A2 | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | -| MVXNet | ✓ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | -| CenterPoint | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | -| SSN | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | -| ImVoteNet | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | -| FCOS3D | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | -| PointNet++ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | -| Group-Free-3D | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | -| ImVoxelNet | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | -| PAConv | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | -| DGCNN | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | -| SMOKE | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | -| PGD | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | -| MonoFlex | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | -| SA-SSD | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | -| FCAF3D | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | -| PV-RCNN | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | -| Cylinder3D | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ | +| | ResNet | PointNet++ | SECOND | DGCNN | RegNetX | DLA | MinkResNet | Cylinder3D | MinkUNet | +| :-----------: | :----: | :--------: | :----: | :---: | :-----: | :-: | :--------: | :--------: | :------: | +| SECOND | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| PointPillars | ✗ | ✗ | ✓ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | +| FreeAnchor | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | +| VoteNet | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| H3DNet | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| 3DSSD | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| Part-A2 | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| MVXNet | ✓ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| CenterPoint | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| SSN | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | +| ImVoteNet | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| FCOS3D | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| PointNet++ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| Group-Free-3D | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| ImVoxelNet | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| PAConv | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| DGCNN | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | +| SMOKE | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | +| PGD | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| MonoFlex | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | +| SA-SSD | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| FCAF3D | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | +| PV-RCNN | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| Cylinder3D | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | +| MinkUNet | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ | **Note:** All the about **300+ models, methods of 40+ papers** in 2D detection supported by [MMDetection](https://github.com/open-mmlab/mmdetection/blob/3.x/docs/en/model_zoo.md) can be trained or used in this codebase. diff --git a/README_zh-CN.md b/README_zh-CN.md index b67ba209ce..5fb40418d2 100644 --- a/README_zh-CN.md +++ b/README_zh-CN.md @@ -131,6 +131,7 @@ MMDetection3D 是一个基于 PyTorch 的目标检测开源工具箱,下一代
  • DGCNN (TOG'2019)
  • DLA (CVPR'2018)
  • MinkResNet (CVPR'2019)
  • +
  • MinkUNet (CVPR'2019)
  • Cylinder3D (CVPR'2021)
  • @@ -217,6 +218,7 @@ MMDetection3D 是一个基于 PyTorch 的目标检测开源工具箱,下一代
  • 室外
  • 室内
  • @@ -233,32 +235,33 @@ MMDetection3D 是一个基于 PyTorch 的目标检测开源工具箱,下一代 -| | ResNet | PointNet++ | SECOND | DGCNN | RegNetX | DLA | MinkResNet | Cylinder3D | -| :-----------: | :----: | :--------: | :----: | :---: | :-----: | :-: | :--------: | :--------: | -| SECOND | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | -| PointPillars | ✗ | ✗ | ✓ | ✗ | ✓ | ✗ | ✗ | ✗ | -| FreeAnchor | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | -| VoteNet | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | -| H3DNet | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | -| 3DSSD | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | -| Part-A2 | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | -| MVXNet | ✓ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | -| CenterPoint | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | -| SSN | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | -| ImVoteNet | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | -| FCOS3D | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | -| PointNet++ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | -| Group-Free-3D | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | -| ImVoxelNet | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | -| PAConv | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | -| DGCNN | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | -| SMOKE | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | -| PGD | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | -| MonoFlex | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | -| SA-SSD | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | -| FCAF3D | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | -| PV-RCNN | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | -| Cylinder3D | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ | +| | ResNet | PointNet++ | SECOND | DGCNN | RegNetX | DLA | MinkResNet | Cylinder3D | MinkUNet | +| :-----------: | :----: | :--------: | :----: | :---: | :-----: | :-: | :--------: | :--------: | :------: | +| SECOND | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| PointPillars | ✗ | ✗ | ✓ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | +| FreeAnchor | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | +| VoteNet | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| H3DNet | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| 3DSSD | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| Part-A2 | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| MVXNet | ✓ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| CenterPoint | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| SSN | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | +| ImVoteNet | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| FCOS3D | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| PointNet++ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| Group-Free-3D | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| ImVoxelNet | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| PAConv | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| DGCNN | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | +| SMOKE | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | +| PGD | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| MonoFlex | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | +| SA-SSD | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| FCAF3D | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | +| PV-RCNN | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| Cylinder3D | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | +| MinkUNet | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ | **注意:**[MMDetection](https://github.com/open-mmlab/mmdetection/blob/3.x/docs/zh_cn/model_zoo.md) 支持的基于 2D 检测的 **300+ 个模型,40+ 的论文算法**在 MMDetection3D 中都可以被训练或使用。 diff --git a/configs/minkunet/README.md b/configs/minkunet/README.md new file mode 100644 index 0000000000..011fc0484c --- /dev/null +++ b/configs/minkunet/README.md @@ -0,0 +1,43 @@ +# 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks + +> [4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks](https://arxiv.org/abs/1904.08755) + + + +## Abstract + +In many robotics and VR/AR applications, 3D-videos are readily-available sources of input (a continuous sequence of depth images, or LIDAR scans). However, those 3D-videos are processed frame-by-frame either through 2D convnets or 3D perception algorithms. In this work, we propose 4-dimensional convolutional neural networks for spatio-temporal perception that can directly process such 3D-videos using high-dimensional convolutions. For this, we adopt sparse tensors and propose the generalized sparse convolution that encompasses all discrete convolutions. To implement the generalized sparse convolution, we create an open-source auto-differentiation library for sparse tensors that provides extensive functions for high-dimensional convolutional neural networks. We create 4D spatio-temporal convolutional neural networks using the library and validate them on various 3D semantic segmentation benchmarks and proposed 4D datasets for 3D-video perception. To overcome challenges in the 4D space, we propose the hybrid kernel, a special case of the generalized sparse convolution, and the trilateral-stationary conditional random field that enforces spatio-temporal consistency in the 7D space-time-chroma space. Experimentally, we show that convolutional neural networks with only generalized 3D sparse convolutions can outperform 2D or 2D-3D hybrid methods by a large margin. Also, we show that on 3D-videos, 4D spatio-temporal convolutional neural networks are robust to noise, outperform 3D convolutional neural networks and are faster than the 3D counterpart in some cases. + +
    + +
    + +## Introduction + +We implement MinkUNet with [TorchSparse](https://github.com/mit-han-lab/torchsparse) backend and provide the result and checkpoints on SemanticKITTI datasets. + +## Results and models + +### SemanticKITTI + +| Method | Lr schd | Mem (GB) | mIoU | Download | +| :----------: | :-----: | :------: | :--: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| MinkUNet-W16 | 15e | 3.4 | 60.3 | [model](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/minkunet/minkunet_w16_8xb2-15e_semantickitti/minkunet_w16_8xb2-15e_semantickitti_20230309_160737-0d8ec25b.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/minkunet/minkunet_w16_8xb2-15e_semantickitti/minkunet_w16_8xb2-15e_semantickitti_20230309_160737.log) | +| MinkUNet-W20 | 15e | 3.7 | 61.6 | [model](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/minkunet/minkunet_w20_8xb2-15e_semantickitti/minkunet_w20_8xb2-15e_semantickitti_20230309_160718-c3b92e6e.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/minkunet/minkunet_w20_8xb2-15e_semantickitti/minkunet_w20_8xb2-15e_semantickitti_20230309_160718.log) | +| MinkUNet-W32 | 15e | 4.9 | 63.1 | [model](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/minkunet/minkunet_w32_8xb2-15e_semantickitti/minkunet_w32_8xb2-15e_semantickitti_20230309_160710-7fa0a6f1.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/minkunet/minkunet_w32_8xb2-15e_semantickitti/minkunet_w32_8xb2-15e_semantickitti_20230309_160710.log) | + +**Note:** We follow the implementation in SPVNAS original [repo](https://github.com/mit-han-lab/spvnas) and W16\\W20\\W32 indicates different number of channels. + +**Note:** Due to TorchSparse backend, the model performance is unstable with TorchSparse backend and may fluctuate by about 1.5 mIoU for different random seeds. + +## Citation + +```latex +@inproceedings{choy20194d, + title={4d spatio-temporal convnets: Minkowski convolutional neural networks}, + author={Choy, Christopher and Gwak, JunYoung and Savarese, Silvio}, + booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition}, + pages={3075--3084}, + year={2019} +} +``` diff --git a/configs/minkunet/metafile.yml b/configs/minkunet/metafile.yml new file mode 100644 index 0000000000..394ff8f9eb --- /dev/null +++ b/configs/minkunet/metafile.yml @@ -0,0 +1,57 @@ +Collections: + - Name: MinkUNet + Metadata: + Training Techniques: + - AdamW + Architecture: + - MinkUNet + Paper: + URL: https://arxiv.org/abs/1904.08755 + Title: '4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks' + README: configs/minkunet/README.md + Code: + URL: https://github.com/open-mmlab/mmdetection3d/blob/1.1/mmdet3d/models/segmentors/minkunet.py#L13 + Version: v1.1.0rc4 + +Models: + - Name: minkunet_w16_8xb2-15e_semantickitti + In Collection: MinkUNet + Config: configs/minkunet/minkunet_w16_8xb2-15e_semantickitti.py + Metadata: + Training Data: SemanticKITTI + Training Memory (GB): 3.4 + Training Resources: 8x A100 GPUs + Results: + - Task: 3D Semantic Segmentation + Dataset: SemanticKITTI + Metrics: + mIoU: 60.3 + Weights: https://download.openmmlab.com/mmdetection3d/v1.1.0_models/minkunet/minkunet_w16_8xb2-15e_semantickitti/minkunet_w16_8xb2-15e_semantickitti_20230309_160737-0d8ec25b.pth + + - Name: minkunet_w20_8xb2-15e_semantickitti + In Collection: MinkUNet + Config: configs/minkunet/minkunet_w20_8xb2-15e_semantickitti.py + Metadata: + Training Data: SemanticKITTI + Training Memory (GB): 3.7 + Training Resources: 8x A100 GPUs + Results: + - Task: 3D Semantic Segmentation + Dataset: SemanticKITTI + Metrics: + mIoU: 61.6 + Weights: https://download.openmmlab.com/mmdetection3d/v1.1.0_models/minkunet/minkunet_w20_8xb2-15e_semantickitti/minkunet_w20_8xb2-15e_semantickitti_20230309_160718-c3b92e6e.pth + + - Name: minkunet_w32_8xb2-15e_semantickitti + In Collection: MinkUNet + Config: configs/minkunet/minkunet_w32_8xb2-15e_semantickitti.py + Metadata: + Training Data: SemanticKITTI + Training Memory (GB): 4.9 + Training Resources: 8x A100 GPUs + Results: + - Task: 3D Semantic Segmentation + Dataset: SemanticKITTI + Metrics: + mIoU: 63.1 + Weights: https://download.openmmlab.com/mmdetection3d/v1.1.0_models/minkunet/minkunet_w32_8xb2-15e_semantickitti/minkunet_w32_8xb2-15e_semantickitti_20230309_160710-7fa0a6f1.pth diff --git a/model-index.yml b/model-index.yml index 8fbe759543..0f62426321 100644 --- a/model-index.yml +++ b/model-index.yml @@ -23,6 +23,7 @@ Import: - configs/smoke/metafile.yml - configs/ssn/metafile.yml - configs/votenet/metafile.yml + - configs/minkunet/metafile.yml - configs/cylinder3d/metafile.yml - configs/pv_rcnn/metafile.yml - configs/fcaf3d/metafile.yml