1Max Planck Institute for Intelligent Systems, Tubingen, Germany
2Meshcapade 3ETH Zurich
Most 3D human pose estimation methods train on real images with 2D keypoints and/or 3D pseudo ground-truth which helps in generalization. However, methods trained on such data exhibit good image alignment but poor 3D accuracy. TokenHMR addresses that by introducing a Threshold-Adaptive Loss Scaling (TALS) loss and reformulating the body regression as token prediction. Our method has two stages:
- Tokenization: The encoder maps continuous poses to discrete pose tokens.
- TokenHMR: During the training of human pose estimation, the pre-trained decoder provides a “vocabulary” of valid poses without imposing biases.
Model | Training Datasets | Comments | TALS Loss | 3DPW | EMDB | ckpt | config | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
PVE | MPJPE | PA-MPJPE | PVE | MPJPE | PA-MPJPE | ||||||
TokenHMR-ITW | SD + ITW + BEDLAM | Paper Version (100K iter with ViTPose pretrained) | Yes | 84.3 | 70.9 | 44.8 | 108.5 | 89.5 | 55.6 | ckpt | config |
TokenHMR-ITW | SD + ITW + BEDLAM | Release Version (200K iter with HMR2.0 pretrained) | Yes | 84.8 | 72.0 | 45.5 | 110.0 | 91.9 | 56.4 | ckpt | config |
TokenHMR-Demo# | SD + ITW + BEDLAM | Demo Version (350K iter with HMR2.0 pretrained) | Yes | 85.0 | 72.8 | 47.1 | 112.2 | 93.7 | 58.9 | ckpt | config |
TokenHMR-BL | BEDLAM | Release Version (100K iter with ViTPose pretrained) | No† | 85.7 | 71.6 | 44.0 | 106.2 | 89.6 | 49.8 | ckpt | config |
# Model used in demo. † Not needed since BEDLAM has ground-truth annotations. All models use tokenization. |
- 02.07.2024: Release of latest TokenHMR model which works for diverse poses
- 05.06.2024: Release of TokenHMR Code and Model - model used in paper
-
Clone the Repository Clone the repository to your local machine:
git clone https://github.com/saidwivedi/TokenHMR.git
-
Create a Conda Environment
Important: Do not use Python versions higher than 3.10.
conda create -n tkhmr python=3.10
-
Install PyTorch
Tested with PyTorch 2.1.0 and CUDA 11.8, but it can also work with lower versions.
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118
-
Install Additional Dependencies
Use the provided requirements.txt file to install additional dependencies.
pip install -r requirements.txt
-
Install Detectron2 for Image Demos
Ensure CUDA_HOME is set to the CUDA version installed with PyTorch.
pip install git+https://github.com/facebookresearch/detectron2
-
Install the forked version (fixed dependencies) of Human Tracker (PHALP, CVPR 2022) for Video Demos
Ensure CUDA_HOME is set to the CUDA version installed with PyTorch.
pip install git+https://github.com/saidwivedi/PHALP
TokenHMR/
├── tokenization/ # Contains code for training, evaluating and running demo for Tokenization. [Method (a)]
│ └── tokenization_data/ # Training/evaluation data for Tokenization. [Check below how to download]
│ ...
├── tokenhmr/ # Contains code for training, evaluating and running demo for TokenHMR [Method (b)]
│ └── dataset_dir/ # Root directory for TokenHMR training
│ └── training_data/ # Training data for TokenHMR. [Check below how to download]
│ └── evaluation_data/ # Training data for TokenHMR. [Check below how to download]
│ ...
├── data/ # Basic data for setup
│ └── body_models/ # SMPL and SMPLH body models
│ └── checkpoints/ # Tokenization and TokenHMR checkpoints with config
├── requirements.txt # dependencies list
└── ...
All the files are uploaded to project webpage. Downloading works only after you register and agree to the licenses.
Use the script fetch_demo_data.sh
to download files needed for running demo. This includes SMPL and SMPLH body models, latest TokenHMR and Tokenization checkpoints. For training and evaluation, refer to respective sections.
bash ./fetch_demo_data.sh
PHALP needs SMPL neutral model for running video demo. Copy the model to appropriate location.
cp data/body_models/smpl/SMPL_NEUTRAL.pkl $HOME/.cache/phalp/3D/models/smpl/
Make sure to install Detectron2 before running demo for images. Check the installation guide for more details.
python tokenhmr/demo.py \
--img_folder demo_sample/images/ \
--batch_size=1 \
--full_frame \
--checkpoint data/checkpoints/tokenhmr_model_latest.ckpt \
--model_config data/checkpoints/model_config.yaml
Make sure to installed the forked version of the original (PHALP, CVPR 2022). Check the installation guide for more details.
python tokenhmr/track.py \
video.source=demo_sample/video/gymnasts.mp4 \
render.colors=slahmr \
+checkpoint=data/checkpoints/tokenhmr_model_latest.ckpt \
+model_config=data/checkpoints/model_config.yaml
We train the tokenizer with body only poses of AMASS and MOYO. The processed files for training and evaluation can be downloaded from here (download only works after registering to project page). Unzip it after the download and the folder structure should look like
TokenHMR/
├── tokenization/
│ └── tokenization_data/ # Training/evaluation data for Tokenization.
│ └── smplh/
│ └── train/ # Train split
│ └── train_CMU.npz
│ ...
│ └── val/ # Validation split
│ └── val_MPI-Mosh.npz
│ ...
└── ...
We train the tokenizer for 150K iterations on a single A100 Nvidia GPU which takes around 2 days.
cd tokenization
python train_poseVQ.py --cfg configs/tokenizer_amass_moyo.yaml
We use BEDLAM and 4DHumans training data for TokenHMR training. Refer to this to download 4D humans training tar files. For BEDLAM tar files, please download from our project page here. For evaluation, please download the images directly from their respective websites: 3DPW and EMDB. Metadata for evaluation can be downloaded here. The final folder structure should look like this
TokenHMR/
├── tokenhmr/
│ └── dataset_dir/
│ └── training_data/ # Training data
│ └── dataset_tars/
│ └── coco-train-2014-pruned/
│ └── aic-train-vitpose/
│ └── bedlam/
| ...
│ ...
│ └── evaluation_data/ # Evaluation data
│ └── 3DPW/
│ └── EMDB/
│ └── emdb.npz
│ └── 3dpw_test.npz
└── ...
After training the tokenizer, we can train TokenHMR. If you want to skip the tokenization training, you can directly use the pretrained model provided in the checkpoint. With 4DHumans pretrained backbone (download model from official repo), it training takes around 4 days on 4 A100 Nvidia GPUs. If you want to change any default settings, please update tokenhmr/lib/configs_hydra/experiment/tokenhmr_release.yaml
.
python tokenhmr/train.py datasets=mix_all experiment=tokenhmr_release
To evaluate the original model (used in the paper) on 3DPW and EMDB from here. Then run this
python tokenhmr/eval.py \
--dataset EMDB,3DPW-TEST \
--batch_size 32 --log_freq 50 \
--dataset_dir tokenhmr/dataset_dir/evaluation_data \
--checkpoint data/checkpoints/tokenhmr_model.ckpt \
--model_config data/checkpoints/model_config.yaml
The code is built on top of these two awesome repositories. I thank the authors for opensourcing their code.
Parts of the code are taken or adapted from the following repos:
We sincerely thank the department of Perceiving Systems and ML team of Meshcapade GmbH for insightful discussions and feedback. We thank the International Max Planck Research School for Intelligent Systems (IMPRS-IS) for supporting Sai Kumar Dwivedi. We thank Meshcapade GmbH for supporting Yu Sun and providing GPU resources. This work was partially supported by the German Federal Ministry of Education and Research (BMBF): Tübingen AI Center, FKZ: 01IS18039B.
If you find this code useful for your research, please consider citing the following paper:
@inproceedings{dwivedi_cvpr2024_tokenhmr,
title={{TokenHMR}: Advancing Human Mesh Recovery with a Tokenized Pose Representation},
author={Dwivedi, Sai Kumar and Sun, Yu and Patel, Priyanka and Feng, Yao and Black, Michael J.},
booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2024},
}
This code is available for non-commercial scientific research purposes as defined in the LICENSE file. By downloading and using this code you agree to the terms in the LICENSE. Third-party datasets and software are subject to their respective licenses.
For code related questions, please contact sai.dwivedi@tuebingen.mpg.de
For commercial licensing (and all related questions for business applications), please contact ps-licensing@tue.mpg.de.