TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration
Yiwei Guo, Shaobin Zhuang, Kunchang Li, Yu Qiao, Yali Wang.
- 2024/10/17: Training/Evaluation codes for TransAgent are released.
To our best knowledge, TransAgent is the first unified distillation framework for generalizing vision-language foundation models with efficient heterogeneous agent collaboration. It contains three key characteristics:
- Knowledge Versatility. TransAgent leverages 11 heterogeneous agents from vision, language and multi-modal research, which comprehensively covers diversified knowledge that is complementary with CLIP-like models.
- Transfer Flexibility. Mixture-of-agents (MoA) gating mechanism is proposed to integrate external knowledge of different agents in each modality.
- Deployment Efficiency. Multi-source distillation is applied to transfer knowledge of heterogeneous agents into CLIP, along with prompt learning, achieving deployment efficiency without a heavy model ensemble.
Name | Base Acc. | Novel Acc. | HM | Epochs |
---|---|---|---|---|
CLIP | 69.34 | 74.22 | 71.70 | - |
CoOp | 82.69 | 63.22 | 71.66 | 200 |
CoCoOp | 80.47 | 71.69 | 75.83 | 10 |
MaPLe | 82.28 | 75.14 | 78.55 | 5 |
PromptSRC | 84.26 | 76.10 | 79.97 | 20 |
TransAgent (Ours) | 85.29 | 77.62 | 81.27 | 20 |
Follow the instructions in INSTALL.md and DATASETS.md to prepare the environment and datasets.
Refer to the TRAIN.md for detailed instructions on training and evaluating TransAgent from scratch.
If you find this repository useful in your research, please use the following BibTeX entry for citation:
@article{guo2024transagent,
title={TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration},
author={Yiwei Guo, Shaobin Zhuang, Kunchang Li, Yu Qiao, Yali Wang},
journal={Advances in neural information processing systems},
year={2024}
}
This repository is built based on CoOp and Co-CoOp, MaPLe, PromptSRC, VPD and ProText.