RefHCM: A Unified Model for Referring Perceptions in Human-Centric Scenarios

Overview

Here is the official implementation of RefHCM, a unified model designed specifically for human-centric scenarios, enabling it to perform several referring perception tasks.

Capabilities of RefHCM

RefHCM paves the way for advanced referring abilities in human-AI interactions. For current applications, it can simplify the AIGC content generation pipeline.

Similar to ComfyUI-Florence2, RefHCM provides addtional keypoint information for specified individuals and more fine-grained human part segmentation results, which can be utilized for tasks like dance generation and image editing. By the way, we are also considering integrating RefHCM into ComfyUI to further expand its utility...

Todo List

Release the code before December 15, 2024
Release the data and model before January 1, 2025
Integrate RefHCM into ComfyUI

Requirements

python 3.7.4
pytorch 1.8.1
torchvision 0.9.1

Installation

git clone https://github.com/JJJYmmm/RefHCM
pip install -r requirements.txt

For environment setup issues, e.g. fairseq installation, refer to the manual setup guide in Google Colab. (recommended)

Quick Start

Download the model weight refhcm.pt from here, and put it in folder /checkpoints

Launch the gradio demo

CUDA_VISIBLE_DEVICES=0 python gradio_demo.py

Now you can try RefHCM 😊, here are some examples.

Data Preparation and Pretrained Model

Please refer to RefHCM/checkpoints at main · JJJYmmm/RefHCM and RefHCM/dataset at main · JJJYmmm/RefHCM

Training and Evaluate

We provide training and evaluate scripts in /run_scripts folder, including single-task and multi-task training.

The scripts are designed to be plug-and-play, assuming you have followed the data preparation and pretrained model setup instructions.

Referring Expression Comprehension (REC)

cd run_script/rec/
bash train_refcoco.sh # training
bash evaluate_refcoco.sh # evaluate

Referring Keypoint (RKpt)

cd run_script/rkpt/
bash train_rkpt.sh # training
bash evaluate_rkpt.sh # evaluate

Referring Parsing (RPar)

full_mask means Query Parallel Generation (QPG) mentioned in the paper, which can speed up the generation speed while retains most of the performance.

cd run_script/rpar/
bash train_rpar.sh # training
bash evaluate_rpar.sh # evaluate

bash train_rpar_full_mask.sh # training for QPG
bash evaluate_rpar_full_mask.sh # evaluate for QPG

Referring Human-Related Caption (RHrc)

cd run_script/rhrc/
bash train_rhrc.sh # training
bash evaluate_rhrc.sh # evaluate

Multi-task Training

cd run_script/multitask/
bash train_multitask.sh # training, including multitask learning \
		        # and reasoning ablity boosting (RefHCM-tuned)

Acknowledgments

OFA for their contribution with the training framework.
UniHCP for providing metric calculations, such as mIoU.

Cite

If you find this repository useful, please consider citing it:

@misc{refhcm2024,
  author = {JJJYmmm},
  title = {{RefHCM: A Unified Model for Referring Perceptions in Human-Centric Scenarios}},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/JJJYmmm/RefHCM}},
}

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
checkpoints		checkpoints
criterions		criterions
data		data
dataset		dataset
examples		examples
fairseq		fairseq
models		models
ofa_module		ofa_module
run_scripts		run_scripts
tasks		tasks
utils		utils
README.md		README.md
demo.py		demo.py
evaluate.py		evaluate.py
gradio_demo.py		gradio_demo.py
requirements.txt		requirements.txt
train.py		train.py
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RefHCM: A Unified Model for Referring Perceptions in Human-Centric Scenarios

Overview

Capabilities of RefHCM

Todo List

Requirements

Installation

Quick Start

Data Preparation and Pretrained Model

Training and Evaluate

Referring Expression Comprehension (REC)

Referring Keypoint (RKpt)

Referring Parsing (RPar)

Referring Human-Related Caption (RHrc)

Multi-task Training

Acknowledgments

Cite

About

Releases

Packages

Languages

JJJYmmm/RefHCM

Folders and files

Latest commit

History

Repository files navigation

RefHCM: A Unified Model for Referring Perceptions in Human-Centric Scenarios

Overview

Capabilities of RefHCM

Todo List

Requirements

Installation

Quick Start

Data Preparation and Pretrained Model

Training and Evaluate

Referring Expression Comprehension (REC)

Referring Keypoint (RKpt)

Referring Parsing (RPar)

Referring Human-Related Caption (RHrc)

Multi-task Training

Acknowledgments

Cite

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages