-
Notifications
You must be signed in to change notification settings - Fork 19
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Co-authored-by: Yuqi Yang <v-yuqyan@microsoft.com>
- Loading branch information
Showing
38 changed files
with
4,804 additions
and
24 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
# For most projects, this workflow file will not need changing; you simply need | ||
# to commit it to your repository. | ||
# | ||
# You may wish to alter this file to override the set of languages analyzed, | ||
# or to provide custom queries or build logic. | ||
# | ||
# ******** NOTE ******** | ||
# We have attempted to detect the languages in your repository. Please check | ||
# the `language` matrix defined below to confirm you have the correct set of | ||
# supported CodeQL languages. | ||
# | ||
name: "CodeQL" | ||
|
||
on: | ||
push: | ||
branches: [ "main", "yuqi_swin3d" ] | ||
pull_request: | ||
# The branches below must be a subset of the branches above | ||
branches: [ "main" ] | ||
schedule: | ||
- cron: '24 17 * * 3' | ||
|
||
jobs: | ||
analyze: | ||
name: Analyze | ||
runs-on: ${{ (matrix.language == 'swift' && 'macos-latest') || 'ubuntu-latest' }} | ||
permissions: | ||
actions: read | ||
contents: read | ||
security-events: write | ||
|
||
strategy: | ||
fail-fast: false | ||
matrix: | ||
language: [ "python" ] | ||
# CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python', 'ruby' ] | ||
# Use only 'java' to analyze code written in Java, Kotlin or both | ||
# Use only 'javascript' to analyze code written in JavaScript, TypeScript or both | ||
# Learn more about CodeQL language support at https://aka.ms/codeql-docs/language-support | ||
|
||
steps: | ||
- name: Checkout repository | ||
uses: actions/checkout@v3 | ||
|
||
# Initializes the CodeQL tools for scanning. | ||
- name: Initialize CodeQL | ||
uses: github/codeql-action/init@v2 | ||
with: | ||
languages: ${{ matrix.language }} | ||
# If you wish to specify custom queries, you can do so here or in a config file. | ||
# By default, queries listed here will override any specified in a config file. | ||
# Prefix the list here with "+" to use these queries and those in the config file. | ||
|
||
# For more details on CodeQL's query packs, refer to: https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs | ||
# queries: security-extended,security-and-quality | ||
|
||
|
||
# Autobuild attempts to build any compiled languages (C/C++, C#, Go, or Java). | ||
# If this step fails, then you should remove it and run the build manually (see below) | ||
- name: Autobuild | ||
uses: github/codeql-action/autobuild@v2 | ||
|
||
# ℹ️ Command-line programs to run using the OS shell. | ||
# 📚 See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun | ||
|
||
# If the Autobuild fails above, remove it and uncomment the following three lines. | ||
# modify them (or add more) to build your code if your project, please refer to the EXAMPLE below for guidance. | ||
|
||
# - run: | | ||
# echo "Run, Build Application using script" | ||
# ./location_of_script_within_repo/buildscript.sh | ||
|
||
- name: Perform CodeQL Analysis | ||
uses: github/codeql-action/analyze@v2 | ||
with: | ||
category: "/language:${{matrix.language}}" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,33 +1,116 @@ | ||
# Project | ||
# Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding | ||
|
||
> This repo has been populated by an initial template to help get you started. Please | ||
> make sure to update the content to build a great experience for community-building. | ||
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swin3d-a-pretrained-transformer-backbone-for/semantic-segmentation-on-scannet)](https://paperswithcode.com/sota/semantic-segmentation-on-scannet?p=swin3d-a-pretrained-transformer-backbone-for) | ||
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swin3d-a-pretrained-transformer-backbone-for/semantic-segmentation-on-s3dis-area5)](https://paperswithcode.com/sota/semantic-segmentation-on-s3dis-area5?p=swin3d-a-pretrained-transformer-backbone-for) | ||
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swin3d-a-pretrained-transformer-backbone-for/semantic-segmentation-on-s3dis)](https://paperswithcode.com/sota/semantic-segmentation-on-s3dis?p=swin3d-a-pretrained-transformer-backbone-for) | ||
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swin3d-a-pretrained-transformer-backbone-for/3d-object-detection-on-scannetv2)](https://paperswithcode.com/sota/3d-object-detection-on-scannetv2?p=swin3d-a-pretrained-transformer-backbone-for) | ||
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swin3d-a-pretrained-transformer-backbone-for/3d-object-detection-on-s3dis)](https://paperswithcode.com/sota/3d-object-detection-on-s3dis?p=swin3d-a-pretrained-transformer-backbone-for) | ||
|
||
As the maintainer of this project, please make a few updates: | ||
## Updates | ||
|
||
- Improving this README.MD file to provide a great experience | ||
- Updating SUPPORT.MD with content about this project's support experience | ||
- Understanding the security reporting process in SECURITY.MD | ||
- Remove this section from the README | ||
***27/04/2023*** | ||
|
||
## Contributing | ||
Initial commits: | ||
|
||
This project welcomes contributions and suggestions. Most contributions require you to agree to a | ||
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us | ||
the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com. | ||
1. Pretrained models on Structured3D are provided. | ||
2. The supported code and models for Semantic Segmentation on ScanNet and S3DIS are provided. | ||
|
||
When you submit a pull request, a CLA bot will automatically determine whether you need to provide | ||
a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions | ||
provided by the bot. You will only need to do this once across all repos using our CLA. | ||
## Introduction | ||
|
||
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). | ||
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or | ||
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments. | ||
We present a pretrained 3D backbone, named Swin3D, that first-time outperforms all state-of-the-art methods on downstream 3D indoor scene understanding tasks. Our backbone network is based on a 3D Swin transformer and carefully designed for efficiently conducting self-attention on sparse voxels with a linear memory complexity and capturing the irregularity of point signals via generalized contextual relative positional embedding. Based on this backbone design, we pretrained a large Swin3D model on a synthetic Structured3D dataset that is 10 times larger than the ScanNet dataset and fine-tuned the pretrained model on various downstream real-world indoor scene understanding tasks. | ||
|
||
## Trademarks | ||
![teaser](figures/swin3D.png) | ||
|
||
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft | ||
trademarks or logos is subject to and must follow | ||
[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general). | ||
Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. | ||
Any use of third-party trademarks or logos are subject to those third-party's policies. | ||
## Overview | ||
|
||
- [Data Preparation](#data-preparation) | ||
- [Pretrained Models](#pretrained-models) | ||
- [Quick Start](#quick-start) | ||
- [Results and models](#results-and-models) | ||
- [Citation](#citation) | ||
|
||
## Data Preparation | ||
|
||
We pretrained our Swin3D on Structured3D, please refer to this [link](https://github.com/yuxiaoguo/Uni3DScenes) to prepare the data. | ||
|
||
## Pretrained Models | ||
|
||
The models pretrained on Structured3D with different cRSE are provided here. | ||
|
||
| | Pretrain | #params | cRSE | mIoU(val) | Model | Log | | ||
| :------- | :----------: | :------ | :----------- | :-------: | :-------: | :-----: | | ||
| Swin3D-S | Structured3D | 23.57M | XYZ,RGB | 77.69 | [model]() | [log]() | | ||
| Swin3D-S | Structured3D | 23.57M | XYZ,RGB,NORM | 79.15 | [model]() | [log]() | | ||
| Swin3D-L | Structured3D | 60.75M | XYZ,RGB | 79.79 | [model]() | [log]() | | ||
| Swin3D-L | Structured3D | 60.75M | XYZ,RGB,NORM | 81.04 | [model]() | [log]() | | ||
|
||
## Quick Start | ||
|
||
Install the package using | ||
|
||
pip install -r requirements.txt | ||
python setup.py install | ||
|
||
Build models and load our pretrained weight, Then you can finetune your model in various task. | ||
|
||
import torch | ||
from Swin3D.models import Swin3DUNet | ||
model = Swin3DUNet(depths, channels, num_heads, \ | ||
window_sizes, quant_size, up_k=up_k, \ | ||
drop_path_rate=drop_path_rate, num_classes=num_classes, \ | ||
num_layers=num_layers, stem_transformer=stem_transformer, \ | ||
upsample=upsample, first_down_stride=down_stride, \ | ||
knn_down=knn_down, in_channels=in_channels, \ | ||
cRSE='XYZ_RGB_NORM', fp16_mode=2) | ||
model.load_pretrained_model(ckpt_path) | ||
|
||
## Results and models | ||
|
||
To reproduce our results on downstream tasks, please follow the code in this [repo](https://github.com/Yukichiii/Swin3D_Task). The results and models are provided here. | ||
|
||
### ScanNet Segmentation | ||
|
||
| | Pretrained | mIoU(Val) | mIoU(Test) | Model | Log | | ||
| :------- | :--------: | :-------: | :--------: | :-------: | :-----: | | ||
| Swin3D-S | ✗ | 75.2 | - | [model]() | [log]() | | ||
| Swin3D-S | ✓ | 75.7 | - | [model]() | [log]() | | ||
| Swin3D-L | ✓ | 77.5 | 77.9 | [model]() | [log]() | | ||
|
||
### S3DIS Segmentation | ||
|
||
| | Pretrained | Area 5 mIoU | 6-fold mIoU | Model | Log | | ||
| :------- | :--------: | :---------: | :---------: | :-------: | :-----: | | ||
| Swin3D-S | ✗ | 72.5 | 76.9 | [model]() | [log]() | | ||
| Swin3D-S | ✓ | 73.0 | 78.2 | [model]() | [log]() | | ||
| Swin3D-L | ✓ | 74.5 | 79.8 | [model]() | [log]() | | ||
|
||
### ScanNet 3D Detection | ||
|
||
| | Pretrained | mAP@0.25 | mAP@0.50 | Model | Log | | ||
| :----------------- | :--------: | :------: | :------: | :---: | :---: | | ||
| Swin3D-S+FCAF3D | ✓ | 74.2 | 59.5 | model | log | | ||
| Swin3D-L+FCAF3D | ✓ | 74.2 | 58.6 | model | log | | ||
| Swin3D-S+CAGroup3D | ✓ | 76.4 | 62.7 | model | log | | ||
| Swin3D-L+CAGroup3D | ✓ | 76.4 | 63.2 | model | log | | ||
|
||
### S3DIS 3D Detection | ||
|
||
| | Pretrained | mAP@0.25 | mAP@0.50 | Model | Log | | ||
| :-------------- | :--------: | :------: | :------: | :---: | :---: | | ||
| Swin3D-S+FCAF3D | ✓ | 69.9 | 50.2 | model | log | | ||
| Swin3D-L+FCAF3D | ✓ | 72.1 | 54.0 | model | log | | ||
|
||
## Citation | ||
|
||
If you find Swin3D useful to your research, please cite our work: | ||
|
||
``` | ||
@misc{yang2023swin3d, | ||
title={Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding}, | ||
author={Yu-Qi Yang and Yu-Xiao Guo and Jian-Yu Xiong and Yang Liu and Hao Pan and Peng-Shuai Wang and Xin Tong and Baining Guo}, | ||
year={2023}, | ||
eprint={2304.06906}, | ||
archivePrefix={arXiv}, | ||
primaryClass={cs.CV} | ||
} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
""" | ||
# Copyright (c) Microsoft Corporation. | ||
# Licensed under the MIT License. | ||
""" | ||
from . import sparse_dl | ||
from . import modules | ||
from . import models | ||
|
||
__version__ = '1.0.0' | ||
|
||
__all__ = [ | ||
'sparse_dl', | ||
'modules', | ||
'models', | ||
] |
Oops, something went wrong.