Here is the official implementation of RefHCM, a unified model designed specifically for human-centric scenarios, enabling it to perform several referring perception tasks.
RefHCM paves the way for advanced referring abilities in human-AI interactions. For current applications, it can simplify the AIGC content generation pipeline.
Similar to ComfyUI-Florence2, RefHCM provides addtional keypoint information for specified individuals and more fine-grained human part segmentation results, which can be utilized for tasks like dance generation and image editing. By the way, we are also considering integrating RefHCM into ComfyUI to further expand its utility...
- Release the code before December 15, 2024
- Release the data and model before January 1, 2025
- Integrate RefHCM into ComfyUI
- python 3.7.4
- pytorch 1.8.1
- torchvision 0.9.1
git clone https://github.com/JJJYmmm/RefHCM
pip install -r requirements.txt
For environment setup issues, e.g. fairseq
installation, refer to the manual setup guide in Google Colab. (recommended)
-
Download the model weight refhcm.pt from here, and put it in folder
/checkpoints
-
Launch the gradio demo
CUDA_VISIBLE_DEVICES=0 python gradio_demo.py
-
Now you can try RefHCM 😊, here are some examples.
Please refer to RefHCM/checkpoints at main · JJJYmmm/RefHCM and RefHCM/dataset at main · JJJYmmm/RefHCM
We provide training and evaluate scripts in /run_scripts
folder, including single-task and multi-task training.
The scripts are designed to be plug-and-play, assuming you have followed the data preparation and pretrained model setup instructions.
cd run_script/rec/
bash train_refcoco.sh # training
bash evaluate_refcoco.sh # evaluate
cd run_script/rkpt/
bash train_rkpt.sh # training
bash evaluate_rkpt.sh # evaluate
full_mask
means Query Parallel Generation (QPG) mentioned in the paper, which can speed up the generation speed while retains most of the performance.
cd run_script/rpar/
bash train_rpar.sh # training
bash evaluate_rpar.sh # evaluate
bash train_rpar_full_mask.sh # training for QPG
bash evaluate_rpar_full_mask.sh # evaluate for QPG
cd run_script/rhrc/
bash train_rhrc.sh # training
bash evaluate_rhrc.sh # evaluate
cd run_script/multitask/
bash train_multitask.sh # training, including multitask learning \
# and reasoning ablity boosting (RefHCM-tuned)
- OFA for their contribution with the training framework.
- UniHCP for providing metric calculations, such as mIoU.
If you find this repository useful, please consider citing it:
@misc{refhcm2024,
author = {JJJYmmm},
title = {{RefHCM: A Unified Model for Referring Perceptions in Human-Centric Scenarios}},
year = {2024},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/JJJYmmm/RefHCM}},
}