Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] Add metafile and paper information of ViTPose #2058

Merged
merged 1 commit into from
Mar 14, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
155 changes: 155 additions & 0 deletions configs/body_2d_keypoint/topdown_heatmap/coco/vitpose_coco.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
Collections:
- Name: ViTPose
Paper:
Title: 'ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation'
URL: https://arxiv.org/abs/2204.12484
README: https://github.com/open-mmlab/mmpose/blob/1.x/docs/src/papers/backbones/vitpose.md
Metadata:
Training Resources: 8x A100 GPUs
Models:
- Config: configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-small_8xb64-210e_coco-256x192.py
In Collection: ViTPose
Metadata:
Architecture: &id001
- ViTPose
- Classic Head
Model Size: Small
Training Data: COCO
Name: td-hm_ViTPose-small_8xb64-210e_coco-256x192
Results:
- Dataset: COCO
Metrics:
AP: 0.739
AP@0.5: 0.903
AP@0.75: 0.816
AR: 0.792
AR@0.5: 0.942
Task: Body 2D Keypoint
Weights: https://download.openmmlab.com/mmpose/v1/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-small_8xb64-210e_coco-256x192-62d7a712_20230314.pth
- Config: configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-base_8xb64-210e_coco-256x192.py
In Collection: ViTPose
Metadata:
Architecture: *id001
Model Size: Base
Training Data: COCO
Name: td-hm_ViTPose-base_8xb64-210e_coco-256x192
Results:
- Dataset: COCO
Metrics:
AP: 0.757
AP@0.5: 0.905
AP@0.75: 0.829
AR: 0.81
AR@0.5: 0.946
Task: Body 2D Keypoint
Weights: https://download.openmmlab.com/mmpose/v1/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-base_8xb64-210e_coco-256x192-216eae50_20230314.pth
- Config: configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-large_8xb64-210e_coco-256x192.py
In Collection: ViTPose
Metadata:
Architecture: *id001
Model Size: Large
Training Data: COCO
Name: td-hm_ViTPose-large_8xb64-210e_coco-256x192
Results:
- Dataset: COCO
Metrics:
AP: 0.782
AP@0.5: 0.914
AP@0.75: 0.850
AR: 0.834
AR@0.5: 0.952
Task: Body 2D Keypoint
Weights: https://download.openmmlab.com/mmpose/v1/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-large_8xb64-210e_coco-256x192-53609f55_20230314.pth
- Config: configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-huge_8xb64-210e_coco-256x192.py
In Collection: ViTPose
Metadata:
Architecture: *id001
Model Size: Huge
Training Data: COCO
Name: td-hm_ViTPose-huge_8xb64-210e_coco-256x192
Results:
- Dataset: COCO
Metrics:
AP: 0.788
AP@0.5: 0.917
AP@0.75: 0.855
AR: 0.839
AR@0.5: 0.954
Task: Body 2D Keypoint
Weights: https://download.openmmlab.com/mmpose/v1/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-huge_8xb64-210e_coco-256x192-e32adcd4_20230314.pth
- Config: configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-small-simple_8xb64-210e_coco-256x192.py
In Collection: ViTPose
Alias: vitpose-s
Metadata:
Architecture: &id002
- ViTPose
- Simple Head
Model Size: Small
Training Data: COCO
Name: td-hm_ViTPose-small-simple_8xb64-210e_coco-256x192
Results:
- Dataset: COCO
Metrics:
AP: 0.736
AP@0.5: 0.900
AP@0.75: 0.811
AR: 0.790
AR@0.5: 0.940
Task: Body 2D Keypoint
Weights: https://download.openmmlab.com/mmpose/v1/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-small-simple_8xb64-210e_coco-256x192-4c101a76_20230314.pth
- Config: configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-base-simple_8xb64-210e_coco-256x192.py
In Collection: ViTPose
Alias:
- vitpose
- vitpose-b
Metadata:
Architecture: *id002
Model Size: Base
Training Data: COCO
Name: td-hm_ViTPose-base-simple_8xb64-210e_coco-256x192
Results:
- Dataset: COCO
Metrics:
AP: 0.756
AP@0.5: 0.906
AP@0.75: 0.826
AR: 0.809
AR@0.5: 0.946
Task: Body 2D Keypoint
Weights: https://download.openmmlab.com/mmpose/v1/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-base-simple_8xb64-210e_coco-256x192-fd73707d_20230314.pth
- Config: configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-large-simple_8xb64-210e_coco-256x192.py
In Collection: ViTPose
Alias: vitpose-l
Metadata:
Architecture: *id002
Model Size: Large
Training Data: COCO
Name: td-hm_ViTPose-large-simple_8xb64-210e_coco-256x192
Results:
- Dataset: COCO
Metrics:
AP: 0.781
AP@0.5: 0.914
AP@0.75: 0.853
AR: 0.833
AR@0.5: 0.952
Task: Body 2D Keypoint
Weights: https://download.openmmlab.com/mmpose/v1/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-large-simple_8xb64-210e_coco-256x192-3a7ee9e1_20230314.pth
- Config: configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-huge-simple_8xb64-210e_coco-256x192.py
In Collection: ViTPose
Alias: vitpose-h
Metadata:
Architecture: *id002
Model Size: Huge
Training Data: COCO
Name: td-hm_ViTPose-huge-simple_8xb64-210e_coco-256x192
Results:
- Dataset: COCO
Metrics:
AP: 0.789
AP@0.5: 0.916
AP@0.75: 0.856
AR: 0.839
AR@0.5: 0.953
Task: Body 2D Keypoint
Weights: https://download.openmmlab.com/mmpose/v1/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-huge-simple_8xb64-210e_coco-256x192-ffd48c05_20230314.pth
30 changes: 30 additions & 0 deletions docs/src/papers/backbones/vitpose.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation

<!-- [ALGORITHM] -->

<details>
<summary align="right"><a href="https://arxiv.org/abs/2204.12484">ViTPose (NeurIPS'2022)</a></summary>

```bibtex
@inproceedings{
xu2022vitpose,
title={Vi{TP}ose: Simple Vision Transformer Baselines for Human Pose Estimation},
author={Yufei Xu and Jing Zhang and Qiming Zhang and Dacheng Tao},
booktitle={Advances in Neural Information Processing Systems},
year={2022},
}
```

</details>

## Abstract

<!-- [ABSTRACT] -->

Although no specific domain knowledge is considered in the design, plain vision transformers have shown excellent performance in visual recognition tasks. However, little effort has been made to reveal the potential of such simple structures for pose estimation tasks. In this paper, we show the surprisingly good capabilities of plain vision transformers for pose estimation from various aspects, namely simplicity in model structure, scalability in model size, flexibility in training paradigm, and transferability of knowledge between models, through a simple baseline model called ViTPose. Specifically, ViTPose employs plain and non-hierarchical vision transformers as backbones to extract features for a given person instance and a lightweight decoder for pose estimation. It can be scaled up from 100M to 1B parameters by taking the advantages of the scalable model capacity and high parallelism of transformers, setting a new Pareto front between throughput and performance. Besides, ViTPose is very flexible regarding the attention type, input resolution, pre-training and finetuning strategy, as well as dealing with multiple pose tasks. We also empirically demonstrate that the knowledge of large ViTPose models can be easily transferred to small ones via a simple knowledge token. Experimental results show that our basic ViTPose model outperforms representative methods on the challenging MS COCO Keypoint Detection benchmark, while the largest model sets a new state-of-the-art.

<!-- [IMAGE] -->

<div align=center>
<img src="https://user-images.githubusercontent.com/26127467/224964357-d3d000fc-768b-4087-96d6-9291c86a3e8a.png">
</div>
1 change: 1 addition & 0 deletions model-index.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ Import:
- configs/body_2d_keypoint/topdown_heatmap/coco/litehrnet_coco.yml
- configs/body_2d_keypoint/topdown_heatmap/coco/mspn_coco.yml
- configs/body_2d_keypoint/topdown_heatmap/coco/hourglass_coco.yml
- configs/body_2d_keypoint/topdown_heatmap/coco/vitpose_coco.yml
- configs/body_2d_keypoint/simcc/coco/resnet_coco.yml
- configs/body_2d_keypoint/simcc/coco/mobilenetv2_coco.yml
- configs/body_2d_keypoint/simcc/coco/vipnas_coco.yml
Expand Down