Skip to content

[CVPR 2024] TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation

License

Notifications You must be signed in to change notification settings

saidwivedi/TokenHMR

Repository files navigation

Computer Vision and Pattern Recognition (CVPR 2024)

(*equal contribution)

1Max Planck Institute for Intelligent Systems, Tubingen, Germany
2Meshcapade     3ETH Zurich

Website shields.io YouTube Badge arXiv

Key Idea

Most 3D human pose estimation methods train on real images with 2D keypoints and/or 3D pseudo ground-truth which helps in generalization. However, methods trained on such data exhibit good image alignment but poor 3D accuracy. TokenHMR addresses that by introducing a Threshold-Adaptive Loss Scaling (TALS) loss and reformulating the body regression as token prediction. Our method has two stages:

  1. Tokenization: The encoder maps continuous poses to discrete pose tokens.
  2. TokenHMR: During the training of human pose estimation, the pre-trained decoder provides a “vocabulary” of valid poses without imposing biases.

Model Zoo

Model Training Datasets Comments TALS Loss 3DPW EMDB ckpt config
PVE MPJPE PA-MPJPE PVE MPJPE PA-MPJPE
TokenHMR-ITW SD + ITW + BEDLAM Paper Version (100K iter with ViTPose pretrained) Yes 84.3 70.9 44.8 108.5 89.5 55.6 ckpt config
TokenHMR-ITW SD + ITW + BEDLAM Release Version (200K iter with HMR2.0 pretrained) Yes 84.8 72.0 45.5 110.0 91.9 56.4 ckpt config
TokenHMR-Demo# SD + ITW + BEDLAM Demo Version (350K iter with HMR2.0 pretrained) Yes 85.0 72.8 47.1 112.2 93.7 58.9 ckpt config
TokenHMR-BL BEDLAM Release Version (100K iter with ViTPose pretrained) No 85.7 71.6 44.0 106.2 89.6 49.8 ckpt config
# Model used in demo.
Not needed since BEDLAM has ground-truth annotations.
All models use tokenization.

Updates

  • 02.07.2024: Release of latest TokenHMR model which works for diverse poses
  • 05.06.2024: Release of TokenHMR Code and Model - model used in paper

Setup and Installation

  1. Clone the Repository Clone the repository to your local machine:

    git clone https://github.com/saidwivedi/TokenHMR.git
  2. Create a Conda Environment

    Important: Do not use Python versions higher than 3.10.

    conda create -n tkhmr python=3.10
  3. Install PyTorch

    Tested with PyTorch 2.1.0 and CUDA 11.8, but it can also work with lower versions.

    pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118
  4. Install Additional Dependencies

    Use the provided requirements.txt file to install additional dependencies.

    pip install -r requirements.txt
  5. Install Detectron2 for Image Demos

    Ensure CUDA_HOME is set to the CUDA version installed with PyTorch.

    pip install git+https://github.com/facebookresearch/detectron2
  6. Install the forked version (fixed dependencies) of Human Tracker (PHALP, CVPR 2022) for Video Demos

    Ensure CUDA_HOME is set to the CUDA version installed with PyTorch.

    pip install git+https://github.com/saidwivedi/PHALP

Code Structure

TokenHMR/
├── tokenization/               # Contains code for training, evaluating and running demo for Tokenization. [Method (a)]
│   └── tokenization_data/      # Training/evaluation data for Tokenization. [Check below how to download]
│   ... 
├── tokenhmr/                   # Contains code for training, evaluating and running demo for TokenHMR [Method (b)]
│   └── dataset_dir/            # Root directory for TokenHMR training
│       └── training_data/      # Training data for TokenHMR. [Check below how to download]
│       └── evaluation_data/    # Training data for TokenHMR. [Check below how to download]
│   ... 
├── data/                       # Basic data for setup
│   └── body_models/            # SMPL and SMPLH body models
│   └── checkpoints/            # Tokenization and TokenHMR checkpoints with config
├── requirements.txt            # dependencies list
└── ...

Preparing Data for Basic Setup [required for demo]

All the files are uploaded to project webpage. Downloading works only after you register and agree to the licenses. Use the script fetch_demo_data.sh to download files needed for running demo. This includes SMPL and SMPLH body models, latest TokenHMR and Tokenization checkpoints. For training and evaluation, refer to respective sections.

bash ./fetch_demo_data.sh

PHALP needs SMPL neutral model for running video demo. Copy the model to appropriate location.

cp data/body_models/smpl/SMPL_NEUTRAL.pkl $HOME/.cache/phalp/3D/models/smpl/

Running TokenHMR Demo on Images

Make sure to install Detectron2 before running demo for images. Check the installation guide for more details.

python tokenhmr/demo.py \
    --img_folder demo_sample/images/ \
    --batch_size=1 \
    --full_frame \
    --checkpoint data/checkpoints/tokenhmr_model_latest.ckpt \
    --model_config data/checkpoints/model_config.yaml
Image 1 Image 2 Image 3 Image 4

Running TokenHMR Demo on Videos

Make sure to installed the forked version of the original (PHALP, CVPR 2022). Check the installation guide for more details.

python tokenhmr/track.py \
    video.source=demo_sample/video/gymnasts.mp4 \
    render.colors=slahmr \
    +checkpoint=data/checkpoints/tokenhmr_model_latest.ckpt \
    +model_config=data/checkpoints/model_config.yaml

Demo GIF

Tokenization

Data Preparation

We train the tokenizer with body only poses of AMASS and MOYO. The processed files for training and evaluation can be downloaded from here (download only works after registering to project page). Unzip it after the download and the folder structure should look like

TokenHMR/
├── tokenization/ 
│   └── tokenization_data/          # Training/evaluation data for Tokenization.
│       └── smplh/
│           └── train/              # Train split
│               └── train_CMU.npz
│               ...
│           └── val/                # Validation split
│               └── val_MPI-Mosh.npz            
│               ...
└── ...

Training

We train the tokenizer for 150K iterations on a single A100 Nvidia GPU which takes around 2 days.

cd tokenization
python train_poseVQ.py --cfg configs/tokenizer_amass_moyo.yaml

TokenHMR

Data Preparation

We use BEDLAM and 4DHumans training data for TokenHMR training. Refer to this to download 4D humans training tar files. For BEDLAM tar files, please download from our project page here. For evaluation, please download the images directly from their respective websites: 3DPW and EMDB. Metadata for evaluation can be downloaded here. The final folder structure should look like this

TokenHMR/
├── tokenhmr/
│   └── dataset_dir/
│       └── training_data/                      # Training data
│           └── dataset_tars/
│               └── coco-train-2014-pruned/
│               └── aic-train-vitpose/
│               └── bedlam/
|               ...                          
│           ...
│       └── evaluation_data/                    # Evaluation data
│           └── 3DPW/
│           └── EMDB/
│           └── emdb.npz
│           └── 3dpw_test.npz
└── ...

Training

After training the tokenizer, we can train TokenHMR. If you want to skip the tokenization training, you can directly use the pretrained model provided in the checkpoint. With 4DHumans pretrained backbone (download model from official repo), it training takes around 4 days on 4 A100 Nvidia GPUs. If you want to change any default settings, please update tokenhmr/lib/configs_hydra/experiment/tokenhmr_release.yaml.

python tokenhmr/train.py datasets=mix_all experiment=tokenhmr_release

Evaluation

To evaluate the original model (used in the paper) on 3DPW and EMDB from here. Then run this

python tokenhmr/eval.py  \
    --dataset EMDB,3DPW-TEST \
    --batch_size 32 --log_freq 50 \
    --dataset_dir tokenhmr/dataset_dir/evaluation_data \
    --checkpoint data/checkpoints/tokenhmr_model.ckpt \
    --model_config data/checkpoints/model_config.yaml

Acknowledgements

The code is built on top of these two awesome repositories. I thank the authors for opensourcing their code.

Parts of the code are taken or adapted from the following repos:

We sincerely thank the department of Perceiving Systems and ML team of Meshcapade GmbH for insightful discussions and feedback. We thank the International Max Planck Research School for Intelligent Systems (IMPRS-IS) for supporting Sai Kumar Dwivedi. We thank Meshcapade GmbH for supporting Yu Sun and providing GPU resources. This work was partially supported by the German Federal Ministry of Education and Research (BMBF): Tübingen AI Center, FKZ: 01IS18039B.

Citation

If you find this code useful for your research, please consider citing the following paper:

@inproceedings{dwivedi_cvpr2024_tokenhmr,
    title={{TokenHMR}: Advancing Human Mesh Recovery with a Tokenized Pose Representation},
    author={Dwivedi, Sai Kumar and Sun, Yu and Patel, Priyanka and Feng, Yao and Black, Michael J.},
    booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2024},
}

License

This code is available for non-commercial scientific research purposes as defined in the LICENSE file. By downloading and using this code you agree to the terms in the LICENSE. Third-party datasets and software are subject to their respective licenses.

Contact

For code related questions, please contact sai.dwivedi@tuebingen.mpg.de

For commercial licensing (and all related questions for business applications), please contact ps-licensing@tue.mpg.de.

About

[CVPR 2024] TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published