Initial submission of Swin3D (#6)

* Initial commit * update README * init repos * add KNN * remove point ops * update readme * fix fp16 * fix bug of knn * update config * add license * add comment * update readme and license * update readme * Create codeql.yml * update readme * update codeql * update codeql * remove cpp codeql * format code * update model * update readme --------- Co-authored-by: Yukichiii <45515584+Yukichiii@users.noreply.github.com> Co-authored-by: Yuqi Yang <v-yuqyan@microsoft.com> Co-authored-by: Yuqi Yang <yangyq18@mails.tsinghua.edu.cn> Co-authored-by: Yuxiao Guo <yuxgu@microsoft.com>
microsoft · Jun 25, 2023 · 4184679 · 4184679
1 parent 022d5ed
commit 4184679
Show file tree

Hide file tree

Showing 4 changed files with 615 additions and 249 deletions.
diff --git a/README.md b/README.md
@@ -13,7 +13,7 @@
 Initial commits:
 
 1. Pretrained models on Structured3D are provided.
-2. The supported code and models for Semantic Segmentation on ScanNet and S3DIS are provided.
+2. The supported code for Semantic Segmentation on ScanNet and S3DIS are provided.
 
 ## Introduction
 
@@ -37,12 +37,12 @@ We pretrained our Swin3D on Structured3D, please refer to this [link](https://gi
 
 The models pretrained on Structured3D with different cRSE are provided here.
 
-| | Pretrain | #params | cRSE | mIoU(val) | Model | Log |
-| :------- | :----------: | :------ | :----------- | :-------: | :-------: | :-----: |
-| Swin3D-S | Structured3D | 23.57M | XYZ,RGB | 77.69 | [model]() | [log]() |
-| Swin3D-S | Structured3D | 23.57M | XYZ,RGB,NORM | 79.15 | [model]() | [log]() |
-| Swin3D-L | Structured3D | 60.75M | XYZ,RGB | 79.79 | [model]() | [log]() |
-| Swin3D-L | Structured3D | 60.75M | XYZ,RGB,NORM | 81.04 | [model]() | [log]() |
+| | Pretrain | #params | cRSE | mIoU(val) |  Model  |  Log  |
+| :------- | :----------: | :------ | :----------- | :-------: | :-----------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------: |
+| Swin3D-S | Structured3D | 23.57M | XYZ,RGB | 77.69 | [model](https://drive.google.com/file/d/1oezNkN3_HZvyxGxjtOpSaQUbGl3YYF90/view?usp=sharing) | [log](https://drive.google.com/file/d/1TuwZqpKm8OYj8BeMhDUhLcGqzXhgJcpC/view?usp=sharing) |
+| Swin3D-S | Structured3D | 23.57M | XYZ,RGB,NORM | 79.15 | [model](https://drive.google.com/file/d/1FMmAgHwS__NtFldH-lFTsraKj0my62t4/view?usp=sharing) | [log](https://drive.google.com/file/d/1-0kz81X0j2Zp-mntN1GwQlsm5sLIy3JX/view?usp=sharing) |
+| Swin3D-L | Structured3D | 60.75M | XYZ,RGB | 79.79 | [model](https://drive.google.com/file/d/1ior8uAQRiVd2mwfYapcaF_e_R80y7DQm/view?usp=sharing) | [log](https://drive.google.com/file/d/1YYd8SOaAIqz16T7XOL54aGPC4sSoMXsW/view?usp=sharing) |
+| Swin3D-L | Structured3D | 60.75M | XYZ,RGB,NORM | 81.04 | [model](https://drive.google.com/file/d/1ySNrP39H6m-euK-2La60-MNOp0e3Pe_4/view?usp=sharing) | [log](https://drive.google.com/file/d/1nXQCw5G2swrSksBnpGBveNSHwAqy8hAZ/view?usp=sharing) |
 
 ## Quick Start
 
@@ -61,44 +61,44 @@ Build models and load our pretrained weight, Then you can finetune your model in
   num_layers=num_layers, stem_transformer=stem_transformer, \
   upsample=upsample, first_down_stride=down_stride, \
   knn_down=knn_down, in_channels=in_channels, \
-  cRSE='XYZ_RGB_NORM', fp16_mode=2)
+  cRSE='XYZ_RGB_NORM', fp16_mode=1)
  model.load_pretrained_model(ckpt_path)
 
 ## Results and models
 
-To reproduce our results on downstream tasks, please follow the code in this [repo](https://github.com/Yukichiii/Swin3D_Task). The results and models are provided here.
+To reproduce our results on downstream tasks, please follow the code in this [repo](https://github.com/Yukichiii/Swin3D_Task). The results are provided here.
 
 ### ScanNet Segmentation
 
-| | Pretrained | mIoU(Val) | mIoU(Test) | Model | Log  |
-| :------- | :--------: | :-------: | :--------: | :-------: | :-----: |
-| Swin3D-S | &cross; | 75.2 | -  | [model]() | [log]() |
-| Swin3D-S | &check; |  75.7 | -  | [model]() | [log]() |
-| Swin3D-L | &check; |  77.5 | 77.9  | [model]() | [log]() |
+| | Pretrained | mIoU(Val)  | mIoU(Test) |
+| :------- | :--------: | :--------: | :--------: |
+| Swin3D-S | &cross; |  75.2 | - |
+| Swin3D-S | &check; | 75.6(76.8) | - |
+| Swin3D-L | &check; | 76.2(77.5) | 77.9 |
 
 ### S3DIS Segmentation
 
-| | Pretrained | Area 5 mIoU | 6-fold mIoU | Model | Log |
-| :------- | :--------: | :---------: | :---------: | :-------: | :-----: |
-| Swin3D-S | &cross; | 72.5 | 76.9 | [model]() | [log]() |
-| Swin3D-S | &check; | 73.0 | 78.2 | [model]() | [log]() |
-| Swin3D-L | &check; | 74.5 | 79.8 | [model]() | [log]() |
+| | Pretrained | Area 5 mIoU | 6-fold mIoU |
+| :------- | :--------: | :---------: | :---------: |
+| Swin3D-S | &cross; | 72.5 | 76.9 |
+| Swin3D-S | &check; | 73.0 | 78.2 |
+| Swin3D-L | &check; | 74.5 | 79.8 |
 
 ### ScanNet 3D Detection
 
-| | Pretrained | mAP@0.25 | mAP@0.50 | Model | Log |
-| :----------------- | :--------: | :------: | :------: | :---: | :---: |
-| Swin3D-S+FCAF3D | &check; | 74.2 | 59.5 | model | log |
-| Swin3D-L+FCAF3D | &check; | 74.2 | 58.6 | model | log |
-| Swin3D-S+CAGroup3D | &check; | 76.4 | 62.7 | model | log |
-| Swin3D-L+CAGroup3D | &check; | 76.4 | 63.2 | model | log |
+| | Pretrained | mAP@0.25 | mAP@0.50 |
+| :----------------- | :--------: | :------: | :------: |
+| Swin3D-S+FCAF3D | &check; | 74.2 | 59.5 |
+| Swin3D-L+FCAF3D | &check; | 74.2 | 58.6 |
+| Swin3D-S+CAGroup3D | &check; | 76.4 | 62.7 |
+| Swin3D-L+CAGroup3D | &check; | 76.4 | 63.2 |
 
 ### S3DIS 3D Detection
 
-| | Pretrained | mAP@0.25 | mAP@0.50 | Model | Log |
-| :-------------- | :--------: | :------: | :------: | :---: | :---: |
-| Swin3D-S+FCAF3D | &check; | 69.9 | 50.2 | model | log |
-| Swin3D-L+FCAF3D | &check; | 72.1 | 54.0 | model | log |
+| | Pretrained | mAP@0.25 | mAP@0.50 |
+| :-------------- | :--------: | :------: | :------: |
+| Swin3D-S+FCAF3D | &check; | 69.9 | 50.2 |
+| Swin3D-L+FCAF3D | &check; | 72.1 | 54.0 |
 
 ## Citation
 

diff --git a/Swin3D/modules/mink_layers.py b/Swin3D/modules/mink_layers.py
@@ -6,13 +6,28 @@
 import torch.nn as nn
 import torch.nn.functional as F
 import MinkowskiEngine as ME
-import numpy as np 
+import numpy as np
+
 
 def assign_feats(sp, x):
- return ME.SparseTensor(features=x.float(), coordinate_map_key=sp.coordinate_map_key, coordinate_manager=sp.coordinate_manager)
+ return ME.SparseTensor(
+ features=x.float(),
+ coordinate_map_key=sp.coordinate_map_key,
+ coordinate_manager=sp.coordinate_manager,
+ )
+
 
 class MinkConvBN(nn.Module):
- def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, dilation=1, bias=False, dimension=3):
+ def __init__(
+ self,
+ in_channels,
+ out_channels,
+ kernel_size=3,
+ stride=1,
+ dilation=1,
+ bias=False,
+ dimension=3,
+ ):
  super().__init__()
  self.conv_layers = nn.Sequential(
  ME.MinkowskiConvolution(
@@ -22,16 +37,27 @@ def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, dilation=
  stride=stride,
  dilation=dilation,
  bias=bias,
- dimension=dimension),
- ME.MinkowskiBatchNorm(out_channels)
+ dimension=dimension,
+ ),
+ ME.MinkowskiBatchNorm(out_channels),
  )
 
  def forward(self, x):
  x = self.conv_layers(x)
  return x
 
+
 class MinkConvBNRelu(nn.Module):
- def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, dilation=1, bias=False, dimension=3):
+ def __init__(
+ self,
+ in_channels,
+ out_channels,
+ kernel_size=3,
+ stride=1,
+ dilation=1,
+ bias=False,
+ dimension=3,
+ ):
  super().__init__()
  self.conv_layers = nn.Sequential(
  ME.MinkowskiConvolution(
@@ -41,9 +67,10 @@ def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, dilation=
  stride=stride,
  dilation=dilation,
  bias=bias,
- dimension=dimension),
+ dimension=dimension,
+ ),
  ME.MinkowskiBatchNorm(out_channels),
- ME.MinkowskiReLU(inplace=True)
+ ME.MinkowskiReLU(inplace=True),
  )
 
  def forward(self, x):
@@ -52,8 +79,18 @@ def forward(self, x):
  x = assign_feats(x, x.F.float())
  return x
 
+
 class MinkDeConvBNRelu(nn.Module):
- def __init__(self, in_channels, out_channels, kernel_size, stride, dilation=1, bias=False, dimension=3):
+ def __init__(
+ self,
+ in_channels,
+ out_channels,
+ kernel_size,
+ stride,
+ dilation=1,
+ bias=False,
+ dimension=3,
+ ):
  super().__init__()
  self.conv_layers = nn.Sequential(
  ME.MinkowskiConvolutionTranspose(
@@ -63,54 +100,58 @@ def __init__(self, in_channels, out_channels, kernel_size, stride, dilation=1, b
  stride=stride,
  dilation=dilation,
  bias=bias,
- dimension=dimension),
+ dimension=dimension,
+ ),
  ME.MinkowskiBatchNorm(out_channels),
- ME.MinkowskiReLU()
+ ME.MinkowskiReLU(),
  )
 
  def forward(self, x):
  x = self.conv_layers(x)
  return x
 
-class MinkResBlock(nn.Module):
- def __init__(self, in_channels, out_channels, stride=1, dilation=1):
- super(MinkResBlock, self).__init__()
 
- self.conv1 = ME.MinkowskiConvolution(
- in_channels=in_channels,
- out_channels=out_channels,
- kernel_size=3,
- stride=stride,
- dilation=dilation,
- bias=False,
- dimension=3)
- self.norm1 = ME.MinkowskiBatchNorm(out_channels)
- self.conv2 = ME.MinkowskiConvolution(
- in_channels=out_channels,
- out_channels=out_channels,
- kernel_size=3,
- stride=1,
- dilation=dilation,
- bias=False,
- dimension=3)
+class MinkResBlock(nn.Module):
+ def __init__(self, in_channels, out_channels, stride=1, dilation=1):
+ super(MinkResBlock, self).__init__()
+
+ self.conv1 = ME.MinkowskiConvolution(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ dilation=dilation,
+ bias=False,
+ dimension=3,
+ )
+ self.norm1 = ME.MinkowskiBatchNorm(out_channels)
+ self.conv2 = ME.MinkowskiConvolution(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=1,
+ dilation=dilation,
+ bias=False,
+ dimension=3,
+ )
 
- self.norm2 = ME.MinkowskiBatchNorm(out_channels)
- self.relu = ME.MinkowskiReLU(inplace=True)
+  self.norm2 = ME.MinkowskiBatchNorm(out_channels)
+  self.relu = ME.MinkowskiReLU(inplace=True)
 
- def forward(self, x):
- residual = x
+  def forward(self, x):
+  residual = x
 
- out = self.conv1(x)
- out = self.norm1(out)
- out = self.relu(out)
+  out = self.conv1(x)
+  out = self.norm1(out)
+  out = self.relu(out)
 
- out = self.conv2(out)
- out = self.norm2(out)
+  out = self.conv2(out)
+  out = self.norm2(out)
 
- out += residual
- out = self.relu(out)
+  out += residual
+  out = self.relu(out)
 
- return out
+  return out
 
 
 class SparseTensorLinear(nn.Module):
@@ -134,22 +175,33 @@ class MinkResBlock_v2(nn.Module):
  def __init__(self, in_channels, out_channels):
  super().__init__()
  d_2 = out_channels // 4
- self.conv1 = torch.nn.Sequential(SparseTensorLinear(in_channels, d_2, bias=False), ME.MinkowskiBatchNorm(d_2), ME.MinkowskiReLU())
- self.unary_2 = torch.nn.Sequential(SparseTensorLinear(d_2, out_channels, bias=False), ME.MinkowskiBatchNorm(out_channels), ME.MinkowskiReLU())
+ self.conv1 = torch.nn.Sequential(
+ SparseTensorLinear(in_channels, d_2, bias=False),
+ ME.MinkowskiBatchNorm(d_2),
+ ME.MinkowskiReLU(),
+ )
+ self.unary_2 = torch.nn.Sequential(
+ SparseTensorLinear(d_2, out_channels, bias=False),
+ ME.MinkowskiBatchNorm(out_channels),
+ ME.MinkowskiReLU(),
+ )
  self.spconv = ME.MinkowskiConvolution(
- in_channels=d_2,
- out_channels=d_2,
- kernel_size=5,
- stride=1,
- dilation=1,
- bias=False,
- dimension=3)
+ in_channels=d_2,
+ out_channels=d_2,
+ kernel_size=5,
+ stride=1,
+ dilation=1,
+ bias=False,
+ dimension=3,
+ )
  if in_channels != out_channels:
  self.shortcut_op = torch.nn.Sequential(
- SparseTensorLinear(in_channels, out_channels, bias=False), ME.MinkowskiBatchNorm(out_channels)
+ SparseTensorLinear(in_channels, out_channels, bias=False),
+ ME.MinkowskiBatchNorm(out_channels),
  )
  else:
  self.shortcut_op = nn.Identity()
+
  def forward(self, x):
  # feats: [N, C]
  # xyz: [N, 3]
@@ -162,28 +214,32 @@ def forward(self, x):
  shortcut = self.shortcut_op(shortcut)
  x += shortcut
  return x
- 
+
 
 class MinkResBlock_BottleNeck(nn.Module):
- def __init__(self, in_channels, out_channels):
- super(MinkResBlock_BottleNeck, self).__init__()
- bottle_neck = out_channels // 4
- self.conv1x1a = MinkConvBNRelu(in_channels, bottle_neck, kernel_size=1, stride=1)
- self.conv3x3 = MinkConvBNRelu(bottle_neck, bottle_neck, kernel_size=3, stride=1)
- self.conv1x1b = MinkConvBN(bottle_neck, out_channels, kernel_size=1, stride=1)
- if in_channels != out_channels:
- self.conv1x1c = MinkConvBN(in_channels, out_channels, kernel_size=1, stride=1)
- else:
- self.conv1x1c = None
- self.relu = ME.MinkowskiReLU(inplace=True)
-
- def forward(self, x):
- residual = x
- out = self.conv1x1a(x)
- out = self.conv3x3(out)
- out = self.conv1x1b(out)
- if self.conv1x1c is not None:
- residual = self.conv1x1c(residual)
- out = self.relu(out+residual)
-
- return out
+ def __init__(self, in_channels, out_channels):
+ super(MinkResBlock_BottleNeck, self).__init__()
+ bottle_neck = out_channels // 4
+ self.conv1x1a = MinkConvBNRelu(
+ in_channels, bottle_neck, kernel_size=1, stride=1
+ )
+ self.conv3x3 = MinkConvBNRelu(bottle_neck, bottle_neck, kernel_size=3, stride=1)
+ self.conv1x1b = MinkConvBN(bottle_neck, out_channels, kernel_size=1, stride=1)
+ if in_channels != out_channels:
+ self.conv1x1c = MinkConvBN(
+ in_channels, out_channels, kernel_size=1, stride=1
+ )
+ else:
+ self.conv1x1c = None
+ self.relu = ME.MinkowskiReLU(inplace=True)
+
+ def forward(self, x):
+ residual = x
+ out = self.conv1x1a(x)
+ out = self.conv3x3(out)
+ out = self.conv1x1b(out)
+ if self.conv1x1c is not None:
+ residual = self.conv1x1c(residual)
+ out = self.relu(out + residual)
+
+ return out