Initial submission of Swin3D (#3)

Co-authored-by: Yuqi Yang <v-yuqyan@microsoft.com>
microsoft · Apr 27, 2023 · 022d5ed · 022d5ed
1 parent 989c503
commit 022d5ed
Show file tree

Hide file tree

Showing 38 changed files with 4,804 additions and 24 deletions.
diff --git a/.github/workflows/codeql.yml b/.github/workflows/codeql.yml
@@ -0,0 +1,76 @@
+# For most projects, this workflow file will not need changing; you simply need
+# to commit it to your repository.
+#
+# You may wish to alter this file to override the set of languages analyzed,
+# or to provide custom queries or build logic.
+#
+# ******** NOTE ********
+# We have attempted to detect the languages in your repository. Please check
+# the `language` matrix defined below to confirm you have the correct set of
+# supported CodeQL languages.
+#
+name: "CodeQL"
+
+on:
+ push:
+ branches: [ "main", "yuqi_swin3d" ]
+ pull_request:
+ # The branches below must be a subset of the branches above
+ branches: [ "main" ]
+ schedule:
+ - cron: '24 17 * * 3'
+
+jobs:
+ analyze:
+ name: Analyze
+ runs-on: ${{ (matrix.language == 'swift' && 'macos-latest') || 'ubuntu-latest' }}
+ permissions:
+ actions: read
+ contents: read
+ security-events: write
+
+ strategy:
+ fail-fast: false
+ matrix:
+ language: [ "python" ]
+ # CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python', 'ruby' ]
+ # Use only 'java' to analyze code written in Java, Kotlin or both
+ # Use only 'javascript' to analyze code written in JavaScript, TypeScript or both
+ # Learn more about CodeQL language support at https://aka.ms/codeql-docs/language-support
+
+ steps:
+ - name: Checkout repository
+ uses: actions/checkout@v3
+
+ # Initializes the CodeQL tools for scanning.
+ - name: Initialize CodeQL
+ uses: github/codeql-action/init@v2
+ with:
+ languages: ${{ matrix.language }}
+ # If you wish to specify custom queries, you can do so here or in a config file.
+ # By default, queries listed here will override any specified in a config file.
+ # Prefix the list here with "+" to use these queries and those in the config file.
+
+ # For more details on CodeQL's query packs, refer to: https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs
+ # queries: security-extended,security-and-quality
+
+
+ # Autobuild attempts to build any compiled languages (C/C++, C#, Go, or Java).
+ # If this step fails, then you should remove it and run the build manually (see below)
+ - name: Autobuild
+ uses: github/codeql-action/autobuild@v2
+
+ # ℹ️ Command-line programs to run using the OS shell.
+ # 📚 See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun
+
+ # If the Autobuild fails above, remove it and uncomment the following three lines.
+ # modify them (or add more) to build your code if your project, please refer to the EXAMPLE below for guidance.
+
+ # - run: |
+ # echo "Run, Build Application using script"
+ # ./location_of_script_within_repo/buildscript.sh
+
+ - name: Perform CodeQL Analysis
+ uses: github/codeql-action/analyze@v2
+ with:
+ category: "/language:${{matrix.language}}"
diff --git a/.gitignore b/.gitignore
@@ -1,3 +1,5 @@
+*.npz
+.vscode
 # Byte-compiled / optimized / DLL files
 __pycache__/
 *.py[cod]
@@ -127,3 +129,10 @@ dmypy.json
 
 # Pyre type checker
 .pyre/
+
+# local vscode file
+.vscode
+
+# data file
+*.npy
+*.npz
diff --git a/README.md b/README.md
@@ -1,33 +1,116 @@
-# Project
+# Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding
 
-> This repo has been populated by an initial template to help get you started. Please
-> make sure to update the content to build a great experience for community-building.
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swin3d-a-pretrained-transformer-backbone-for/semantic-segmentation-on-scannet)](https://paperswithcode.com/sota/semantic-segmentation-on-scannet?p=swin3d-a-pretrained-transformer-backbone-for)
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swin3d-a-pretrained-transformer-backbone-for/semantic-segmentation-on-s3dis-area5)](https://paperswithcode.com/sota/semantic-segmentation-on-s3dis-area5?p=swin3d-a-pretrained-transformer-backbone-for)
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swin3d-a-pretrained-transformer-backbone-for/semantic-segmentation-on-s3dis)](https://paperswithcode.com/sota/semantic-segmentation-on-s3dis?p=swin3d-a-pretrained-transformer-backbone-for)
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swin3d-a-pretrained-transformer-backbone-for/3d-object-detection-on-scannetv2)](https://paperswithcode.com/sota/3d-object-detection-on-scannetv2?p=swin3d-a-pretrained-transformer-backbone-for)
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swin3d-a-pretrained-transformer-backbone-for/3d-object-detection-on-s3dis)](https://paperswithcode.com/sota/3d-object-detection-on-s3dis?p=swin3d-a-pretrained-transformer-backbone-for)
 
-As the maintainer of this project, please make a few updates:
+## Updates
 
-- Improving this README.MD file to provide a great experience
-- Updating SUPPORT.MD with content about this project's support experience
-- Understanding the security reporting process in SECURITY.MD
-- Remove this section from the README
+***27/04/2023***
 
-## Contributing
+Initial commits:
 
-This project welcomes contributions and suggestions. Most contributions require you to agree to a
-Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
-the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
+1. Pretrained models on Structured3D are provided.
+2. The supported code and models for Semantic Segmentation on ScanNet and S3DIS are provided.
 
-When you submit a pull request, a CLA bot will automatically determine whether you need to provide
-a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
-provided by the bot. You will only need to do this once across all repos using our CLA.
+## Introduction
 
-This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
-For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
-contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
+We present a pretrained 3D backbone, named Swin3D, that first-time outperforms all state-of-the-art methods on downstream 3D indoor scene understanding tasks. Our backbone network is based on a 3D Swin transformer and carefully designed for efficiently conducting self-attention on sparse voxels with a linear memory complexity and capturing the irregularity of point signals via generalized contextual relative positional embedding. Based on this backbone design, we pretrained a large Swin3D model on a synthetic Structured3D dataset that is 10 times larger than the ScanNet dataset and fine-tuned the pretrained model on various downstream real-world indoor scene understanding tasks.
 
-## Trademarks
+![teaser](figures/swin3D.png)
 
-This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft 
-trademarks or logos is subject to and must follow 
-[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
-Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
-Any use of third-party trademarks or logos are subject to those third-party's policies.
+## Overview
+
+- [Data Preparation](#data-preparation)
+- [Pretrained Models](#pretrained-models)
+- [Quick Start](#quick-start)
+- [Results and models](#results-and-models)
+- [Citation](#citation)
+
+## Data Preparation
+
+We pretrained our Swin3D on Structured3D, please refer to this [link](https://github.com/yuxiaoguo/Uni3DScenes) to prepare the data.
+
+## Pretrained Models
+
+The models pretrained on Structured3D with different cRSE are provided here.
+
+| | Pretrain | #params | cRSE | mIoU(val) | Model | Log |
+| :------- | :----------: | :------ | :----------- | :-------: | :-------: | :-----: |
+| Swin3D-S | Structured3D | 23.57M | XYZ,RGB | 77.69 | [model]() | [log]() |
+| Swin3D-S | Structured3D | 23.57M | XYZ,RGB,NORM | 79.15 | [model]() | [log]() |
+| Swin3D-L | Structured3D | 60.75M | XYZ,RGB | 79.79 | [model]() | [log]() |
+| Swin3D-L | Structured3D | 60.75M | XYZ,RGB,NORM | 81.04 | [model]() | [log]() |
+
+## Quick Start
+
+Install the package using 
+
+ pip install -r requirements.txt
+ python setup.py install
+
+Build models and load our pretrained weight, Then you can finetune your model in various task.
+
+ import torch
+ from Swin3D.models import Swin3DUNet
+ model = Swin3DUNet(depths, channels, num_heads, \
+  window_sizes, quant_size, up_k=up_k, \
+  drop_path_rate=drop_path_rate, num_classes=num_classes, \
+  num_layers=num_layers, stem_transformer=stem_transformer, \
+  upsample=upsample, first_down_stride=down_stride, \
+  knn_down=knn_down, in_channels=in_channels, \
+  cRSE='XYZ_RGB_NORM', fp16_mode=2)
+ model.load_pretrained_model(ckpt_path)
+
+## Results and models
+
+To reproduce our results on downstream tasks, please follow the code in this [repo](https://github.com/Yukichiii/Swin3D_Task). The results and models are provided here.
+
+### ScanNet Segmentation
+
+| | Pretrained | mIoU(Val) | mIoU(Test) | Model | Log |
+| :------- | :--------: | :-------: | :--------: | :-------: | :-----: |
+| Swin3D-S | &cross; | 75.2 | - | [model]() | [log]() |
+| Swin3D-S | &check; | 75.7 | - | [model]() | [log]() |
+| Swin3D-L | &check; | 77.5 | 77.9 | [model]() | [log]() |
+
+### S3DIS Segmentation
+
+| | Pretrained | Area 5 mIoU | 6-fold mIoU | Model | Log |
+| :------- | :--------: | :---------: | :---------: | :-------: | :-----: |
+| Swin3D-S | &cross; | 72.5 | 76.9 | [model]() | [log]() |
+| Swin3D-S | &check; | 73.0 | 78.2 | [model]() | [log]() |
+| Swin3D-L | &check; | 74.5 | 79.8 | [model]() | [log]() |
+
+### ScanNet 3D Detection
+
+| | Pretrained | mAP@0.25 | mAP@0.50 | Model | Log |
+| :----------------- | :--------: | :------: | :------: | :---: | :---: |
+| Swin3D-S+FCAF3D | &check; | 74.2 | 59.5 | model | log |
+| Swin3D-L+FCAF3D | &check; | 74.2 | 58.6 | model | log |
+| Swin3D-S+CAGroup3D | &check; | 76.4 | 62.7 | model | log |
+| Swin3D-L+CAGroup3D | &check; | 76.4 | 63.2 | model | log |
+
+### S3DIS 3D Detection
+
+| | Pretrained | mAP@0.25 | mAP@0.50 | Model | Log |
+| :-------------- | :--------: | :------: | :------: | :---: | :---: |
+| Swin3D-S+FCAF3D | &check; | 69.9 | 50.2 | model | log |
+| Swin3D-L+FCAF3D | &check; | 72.1 | 54.0 | model | log |
+
+## Citation
+
+If you find Swin3D useful to your research, please cite our work:
+
+```
+@misc{yang2023swin3d,
+ title={Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding}, 
+ author={Yu-Qi Yang and Yu-Xiao Guo and Jian-Yu Xiong and Yang Liu and Hao Pan and Peng-Shuai Wang and Xin Tong and Baining Guo},
+ year={2023},
+ eprint={2304.06906},
+ archivePrefix={arXiv},
+ primaryClass={cs.CV}
+}
+```
diff --git a/Swin3D/__init__.py b/Swin3D/__init__.py
@@ -0,0 +1,15 @@
+"""
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+"""
+from . import sparse_dl
+from . import modules
+from . import models
+
+__version__ = '1.0.0'
+
+__all__ = [
+ 'sparse_dl',
+ 'modules',
+ 'models',
+]