Skip to content

[NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration

License

Notifications You must be signed in to change notification settings

markywg/transagent

Repository files navigation

TransAgent

TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration
Yiwei Guo, Shaobin Zhuang, Kunchang Li, Yu Qiao, Yali Wang.

Update

  • 2024/10/17: Training/Evaluation codes for TransAgent are released.

Introduction

teaser

To our best knowledge, TransAgent is the first unified distillation framework for generalizing vision-language foundation models with efficient heterogeneous agent collaboration. It contains three key characteristics:

  • Knowledge Versatility. TransAgent leverages 11 heterogeneous agents from vision, language and multi-modal research, which comprehensively covers diversified knowledge that is complementary with CLIP-like models.
  • Transfer Flexibility. Mixture-of-agents (MoA) gating mechanism is proposed to integrate external knowledge of different agents in each modality.
  • Deployment Efficiency. Multi-source distillation is applied to transfer knowledge of heterogeneous agents into CLIP, along with prompt learning, achieving deployment efficiency without a heavy model ensemble.

Comparison with state-of-the-art methods

Base-to-Novel Generalization

Name Base Acc. Novel Acc. HM Epochs
CLIP 69.34 74.22 71.70 -
CoOp 82.69 63.22 71.66 200
CoCoOp 80.47 71.69 75.83 10
MaPLe 82.28 75.14 78.55 5
PromptSRC 84.26 76.10 79.97 20
TransAgent (Ours) 85.29 77.62 81.27 20

Few-Shot Classification

few-shot

Preparation

Follow the instructions in INSTALL.md and DATASETS.md to prepare the environment and datasets.

Training & Evaluation

Refer to the TRAIN.md for detailed instructions on training and evaluating TransAgent from scratch.


Cite

If you find this repository useful in your research, please use the following BibTeX entry for citation:

@article{guo2024transagent,
  title={TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration},
  author={Yiwei Guo, Shaobin Zhuang, Kunchang Li, Yu Qiao, Yali Wang},
  journal={Advances in neural information processing systems},
  year={2024}
}

Acknowledgement

This repository is built based on CoOp and Co-CoOp, MaPLe, PromptSRC, VPD and ProText.

About

[NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published