Code for the Paper "AgentRefine: Enhancing Agent Generalization through Refinement Tuning".
🔔 If you have any questions or suggestions, please don't hesitate to let us know. You can post an issue on this repository.
-
[01/2025] 🔥 Our paper has been accepted by ICLR 2025.
-
[01/2025] 🔥 We will release our model, inference code in one month!
We introduce AgentRefine, an agent synthesis framework that enables models to learn from observations within trajectories to correct their own errors. AgentRefine significantly outperforms state-of-the-art agent tuning works in terms of generalization capabilities across diverse agent tasks. Our findings establish a relationship between agent generalization and self-improvement, offering a new paradigm for future research.
We provided our training data in HuggingFace:
⭐ We will also provide inference code and model soon! Thanks for waiting!
The performance comparison of AgentRefine and other methods across different families and sizes.(The underlined text indicates that the training data is sampled in the same environment as the task and is considered as held-in evaluation.)
Method | Alfworld | BabyAI | SciWorld | PDDL | Jericho | |||||
---|---|---|---|---|---|---|---|---|---|---|
Success | Progress | Success | Progress | Success | Progress | Success | Progress | Success | Progress | |
GPT Series | ||||||||||
GPT-4o | 66.4 | 79.9 | 48.2 | 64.1 | 40 | 76.9 | 61.7 | 69.8 | 10.0 | 34.0 |
GPT-4o-mini | 37.3 | 65.0 | 36.6 | 51.9 | 23.3 | 49.8 | 25.0 | 49.1 | 10.0 | 28.5 |
LLaMA-3-8B Series | ||||||||||
LLaMA-3-8B-Instruct | 22.4 | 46.1 | 45.5 | 56.5 | 7.8 | 41.1 | 10.0 | 38.4 | 0.0 | 24.3 |
AgentGen | 29.1 | 47.6 | 20.5 | 35.0 | - | - | 11.7 | 23.0 | - | - |
AgentGym | 61.9 | 76.9 | 47.3 | 61.4 | 18.9 | 47.5 | 1.7 | 16.6 | 0.0 | 12.9 |
Agent-FLAN | 67.2 | 79.7 | 25.0 | 35.3 | 1.1 | 10.9 | 8.3 | 25.5 | 0.0 | 10.1 |
AgentRefine | 44.8 | 63.8 | 37.5 | 50.4 | 14.4 | 42.6 | 16.6 | 37.8 | 10.0 | 32.3 |
Mistral Series | ||||||||||
Mistral-7B-Instruct-v0.3 | 12.4 | 35.9 | 36.6 | 45.8 | 6.7 | 24.7 | 13.3 | 27.8 | 0.0 | 17.3 |
AgentGym | 76.9 | 86.7 | 40.2 | 56.3 | 15.6 | 48.3 | 1.7 | 7.3 | 0.0 | 13.0 |
Agent-FLAN | 77.6 | 87.6 | 15.2 | 21.0 | 0 | 6.7 | 0 | 3.2 | 0.0 | 0.7 |
AgentRefine | 51.4 | 68.8 | 25.9 | 42.4 | 4.4 | 22.4 | 11.7 | 32.8 | 5.0 | 28.8 |
LLaMA-3-70B Series | ||||||||||
LLaMA-3-70B-Instruct | 67.2 | 75.2 | 48.2 | 61.8 | 42.2 | 75.4 | 55.0 | 79.8 | 25.0 | 46.4 |
Agent-FLAN | 80.5 | 86.8 | 32.1 | 41.2 | 5.5 | 16.4 | 25.0 | 53.7 | 0.0 | 13.6 |
AgentRefine | 67.2 | 72.1 | 44.6 | 59.7 | 17.7 | 46.4 | 38.3 | 58.6 | 15.0 | 37.2 |
Please kindly cite our paper if it helps your research:
@inproceedings{fu2025agentrefine,
title={AgentRefine: Enhancing Agent Generalization through Refinement Tuning},
author={Dayuan Fu and Keqing He and Yejie Wang and Wentao Hong and Zhuoma GongQue and Weihao Zeng and Wei Wang and Jingang Wang and Xunliang Cai and Weiran Xu},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=FDimWzmcWn}
}