TransAgent

TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration
Yiwei Guo, Shaobin Zhuang, Kunchang Li, Yu Qiao, Yali Wang.

Update

2024/10/17: Training/Evaluation codes for TransAgent are released.

Introduction

To our best knowledge, TransAgent is the first unified distillation framework for generalizing vision-language foundation models with efficient heterogeneous agent collaboration. It contains three key characteristics:

Knowledge Versatility. TransAgent leverages 11 heterogeneous agents from vision, language and multi-modal research, which comprehensively covers diversified knowledge that is complementary with CLIP-like models.
Transfer Flexibility. Mixture-of-agents (MoA) gating mechanism is proposed to integrate external knowledge of different agents in each modality.
Deployment Efficiency. Multi-source distillation is applied to transfer knowledge of heterogeneous agents into CLIP, along with prompt learning, achieving deployment efficiency without a heavy model ensemble.

Comparison with state-of-the-art methods

Base-to-Novel Generalization

Name	Base Acc.	Novel Acc.	HM	Epochs
CLIP	69.34	74.22	71.70	-
CoOp	82.69	63.22	71.66	200
CoCoOp	80.47	71.69	75.83	10
MaPLe	82.28	75.14	78.55	5
PromptSRC	84.26	76.10	79.97	20
TransAgent (Ours)	85.29	77.62	81.27	20

Few-Shot Classification

Preparation

Follow the instructions in INSTALL.md and DATASETS.md to prepare the environment and datasets.

Training & Evaluation

Refer to the TRAIN.md for detailed instructions on training and evaluating TransAgent from scratch.

Cite

If you find this repository useful in your research, please use the following BibTeX entry for citation:

@article{guo2024transagent,
  title={TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration},
  author={Yiwei Guo, Shaobin Zhuang, Kunchang Li, Yu Qiao, Yali Wang},
  journal={Advances in neural information processing systems},
  year={2024}
}

Acknowledgement

This repository is built based on CoOp and Co-CoOp, MaPLe, PromptSRC, VPD and ProText.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
clip		clip
configs		configs
datasets		datasets
diffusion		diffusion
docs		docs
interpret_prompts		interpret_prompts
knowledge_extraction		knowledge_extraction
ldm		ldm
lpclip		lpclip
models		models
scripts		scripts
template		template
template_protext		template_protext
trainers		trainers
LICENSE		LICENSE
README.md		README.md
clip_words.csv		clip_words.csv
parse_test_res.py		parse_test_res.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TransAgent

Update

Introduction

Comparison with state-of-the-art methods

Base-to-Novel Generalization

Few-Shot Classification

Preparation

Training & Evaluation

Cite

Acknowledgement

About

Releases

Packages

Languages

License

markywg/transagent

Folders and files

Latest commit

History

Repository files navigation

TransAgent

Update

Introduction

Comparison with state-of-the-art methods

Base-to-Novel Generalization

Few-Shot Classification

Preparation

Training & Evaluation

Cite

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages