Skip to content

Commit

Permalink
Initial submission of Swin3D (#3)
Browse files Browse the repository at this point in the history
Co-authored-by: Yuqi Yang <v-yuqyan@microsoft.com>
  • Loading branch information
yuxiaoguo and Yuqi Yang authored Apr 27, 2023
1 parent 989c503 commit 022d5ed
Show file tree
Hide file tree
Showing 38 changed files with 4,804 additions and 24 deletions.
76 changes: 76 additions & 0 deletions .github/workflows/codeql.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# For most projects, this workflow file will not need changing; you simply need
# to commit it to your repository.
#
# You may wish to alter this file to override the set of languages analyzed,
# or to provide custom queries or build logic.
#
# ******** NOTE ********
# We have attempted to detect the languages in your repository. Please check
# the `language` matrix defined below to confirm you have the correct set of
# supported CodeQL languages.
#
name: "CodeQL"

on:
push:
branches: [ "main", "yuqi_swin3d" ]
pull_request:
# The branches below must be a subset of the branches above
branches: [ "main" ]
schedule:
- cron: '24 17 * * 3'

jobs:
analyze:
name: Analyze
runs-on: ${{ (matrix.language == 'swift' && 'macos-latest') || 'ubuntu-latest' }}
permissions:
actions: read
contents: read
security-events: write

strategy:
fail-fast: false
matrix:
language: [ "python" ]
# CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python', 'ruby' ]
# Use only 'java' to analyze code written in Java, Kotlin or both
# Use only 'javascript' to analyze code written in JavaScript, TypeScript or both
# Learn more about CodeQL language support at https://aka.ms/codeql-docs/language-support

steps:
- name: Checkout repository
uses: actions/checkout@v3

# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
# By default, queries listed here will override any specified in a config file.
# Prefix the list here with "+" to use these queries and those in the config file.

# For more details on CodeQL's query packs, refer to: https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs
# queries: security-extended,security-and-quality


# Autobuild attempts to build any compiled languages (C/C++, C#, Go, or Java).
# If this step fails, then you should remove it and run the build manually (see below)
- name: Autobuild
uses: github/codeql-action/autobuild@v2

# ℹ️ Command-line programs to run using the OS shell.
# 📚 See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun

# If the Autobuild fails above, remove it and uncomment the following three lines.
# modify them (or add more) to build your code if your project, please refer to the EXAMPLE below for guidance.

# - run: |
# echo "Run, Build Application using script"
# ./location_of_script_within_repo/buildscript.sh

- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
with:
category: "/language:${{matrix.language}}"
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
*.npz
.vscode
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down Expand Up @@ -127,3 +129,10 @@ dmypy.json

# Pyre type checker
.pyre/

# local vscode file
.vscode

# data file
*.npy
*.npz
131 changes: 107 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,116 @@
# Project
# Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding

> This repo has been populated by an initial template to help get you started. Please
> make sure to update the content to build a great experience for community-building.
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swin3d-a-pretrained-transformer-backbone-for/semantic-segmentation-on-scannet)](https://paperswithcode.com/sota/semantic-segmentation-on-scannet?p=swin3d-a-pretrained-transformer-backbone-for)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swin3d-a-pretrained-transformer-backbone-for/semantic-segmentation-on-s3dis-area5)](https://paperswithcode.com/sota/semantic-segmentation-on-s3dis-area5?p=swin3d-a-pretrained-transformer-backbone-for)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swin3d-a-pretrained-transformer-backbone-for/semantic-segmentation-on-s3dis)](https://paperswithcode.com/sota/semantic-segmentation-on-s3dis?p=swin3d-a-pretrained-transformer-backbone-for)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swin3d-a-pretrained-transformer-backbone-for/3d-object-detection-on-scannetv2)](https://paperswithcode.com/sota/3d-object-detection-on-scannetv2?p=swin3d-a-pretrained-transformer-backbone-for)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swin3d-a-pretrained-transformer-backbone-for/3d-object-detection-on-s3dis)](https://paperswithcode.com/sota/3d-object-detection-on-s3dis?p=swin3d-a-pretrained-transformer-backbone-for)

As the maintainer of this project, please make a few updates:
## Updates

- Improving this README.MD file to provide a great experience
- Updating SUPPORT.MD with content about this project's support experience
- Understanding the security reporting process in SECURITY.MD
- Remove this section from the README
***27/04/2023***

## Contributing
Initial commits:

This project welcomes contributions and suggestions. Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
1. Pretrained models on Structured3D are provided.
2. The supported code and models for Semantic Segmentation on ScanNet and S3DIS are provided.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide
a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
provided by the bot. You will only need to do this once across all repos using our CLA.
## Introduction

This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
We present a pretrained 3D backbone, named Swin3D, that first-time outperforms all state-of-the-art methods on downstream 3D indoor scene understanding tasks. Our backbone network is based on a 3D Swin transformer and carefully designed for efficiently conducting self-attention on sparse voxels with a linear memory complexity and capturing the irregularity of point signals via generalized contextual relative positional embedding. Based on this backbone design, we pretrained a large Swin3D model on a synthetic Structured3D dataset that is 10 times larger than the ScanNet dataset and fine-tuned the pretrained model on various downstream real-world indoor scene understanding tasks.

## Trademarks
![teaser](figures/swin3D.png)

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft
trademarks or logos is subject to and must follow
[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
Any use of third-party trademarks or logos are subject to those third-party's policies.
## Overview

- [Data Preparation](#data-preparation)
- [Pretrained Models](#pretrained-models)
- [Quick Start](#quick-start)
- [Results and models](#results-and-models)
- [Citation](#citation)

## Data Preparation

We pretrained our Swin3D on Structured3D, please refer to this [link](https://github.com/yuxiaoguo/Uni3DScenes) to prepare the data.

## Pretrained Models

The models pretrained on Structured3D with different cRSE are provided here.

| | Pretrain | #params | cRSE | mIoU(val) | Model | Log |
| :------- | :----------: | :------ | :----------- | :-------: | :-------: | :-----: |
| Swin3D-S | Structured3D | 23.57M | XYZ,RGB | 77.69 | [model]() | [log]() |
| Swin3D-S | Structured3D | 23.57M | XYZ,RGB,NORM | 79.15 | [model]() | [log]() |
| Swin3D-L | Structured3D | 60.75M | XYZ,RGB | 79.79 | [model]() | [log]() |
| Swin3D-L | Structured3D | 60.75M | XYZ,RGB,NORM | 81.04 | [model]() | [log]() |

## Quick Start

Install the package using

pip install -r requirements.txt
python setup.py install

Build models and load our pretrained weight, Then you can finetune your model in various task.

import torch
from Swin3D.models import Swin3DUNet
model = Swin3DUNet(depths, channels, num_heads, \
window_sizes, quant_size, up_k=up_k, \
drop_path_rate=drop_path_rate, num_classes=num_classes, \
num_layers=num_layers, stem_transformer=stem_transformer, \
upsample=upsample, first_down_stride=down_stride, \
knn_down=knn_down, in_channels=in_channels, \
cRSE='XYZ_RGB_NORM', fp16_mode=2)
model.load_pretrained_model(ckpt_path)

## Results and models

To reproduce our results on downstream tasks, please follow the code in this [repo](https://github.com/Yukichiii/Swin3D_Task). The results and models are provided here.

### ScanNet Segmentation

| | Pretrained | mIoU(Val) | mIoU(Test) | Model | Log |
| :------- | :--------: | :-------: | :--------: | :-------: | :-----: |
| Swin3D-S | &cross; | 75.2 | - | [model]() | [log]() |
| Swin3D-S | &check; | 75.7 | - | [model]() | [log]() |
| Swin3D-L | &check; | 77.5 | 77.9 | [model]() | [log]() |

### S3DIS Segmentation

| | Pretrained | Area 5 mIoU | 6-fold mIoU | Model | Log |
| :------- | :--------: | :---------: | :---------: | :-------: | :-----: |
| Swin3D-S | &cross; | 72.5 | 76.9 | [model]() | [log]() |
| Swin3D-S | &check; | 73.0 | 78.2 | [model]() | [log]() |
| Swin3D-L | &check; | 74.5 | 79.8 | [model]() | [log]() |

### ScanNet 3D Detection

| | Pretrained | mAP@0.25 | mAP@0.50 | Model | Log |
| :----------------- | :--------: | :------: | :------: | :---: | :---: |
| Swin3D-S+FCAF3D | &check; | 74.2 | 59.5 | model | log |
| Swin3D-L+FCAF3D | &check; | 74.2 | 58.6 | model | log |
| Swin3D-S+CAGroup3D | &check; | 76.4 | 62.7 | model | log |
| Swin3D-L+CAGroup3D | &check; | 76.4 | 63.2 | model | log |

### S3DIS 3D Detection

| | Pretrained | mAP@0.25 | mAP@0.50 | Model | Log |
| :-------------- | :--------: | :------: | :------: | :---: | :---: |
| Swin3D-S+FCAF3D | &check; | 69.9 | 50.2 | model | log |
| Swin3D-L+FCAF3D | &check; | 72.1 | 54.0 | model | log |

## Citation

If you find Swin3D useful to your research, please cite our work:

```
@misc{yang2023swin3d,
title={Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding},
author={Yu-Qi Yang and Yu-Xiao Guo and Jian-Yu Xiong and Yang Liu and Hao Pan and Peng-Shuai Wang and Xin Tong and Baining Guo},
year={2023},
eprint={2304.06906},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
15 changes: 15 additions & 0 deletions Swin3D/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
"""
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
"""
from . import sparse_dl
from . import modules
from . import models

__version__ = '1.0.0'

__all__ = [
'sparse_dl',
'modules',
'models',
]
Loading

0 comments on commit 022d5ed

Please sign in to comment.