This is the official repo for our in-progress work, Token-Budget-Aware LLM Reasoning.
Reasoning is crucial for LLMs to perform complex tasks, but methods like Chain-of-Thought (CoT) reasoning often lead to significant token overhead and increased costs. We identify substantial token redundancy in the reasoning process of state-of-the-art LLMs and propose a token-budget-aware reasoning framework. This approach dynamically allocates token budgets based on problem complexity to guide the reasoning process. Experiments demonstrate that our method reduces token usage in CoT reasoning with minimal performance trade-offs, striking a practical balance between efficiency and accuracy.
Please see requirements.txt.
python -u inference.py --data_name GSM8K-Zero --model gpt-4o-mini
python -u inference.py --data_name GSM8K-Zero --model gpt-4o-mini --reasoning
python -u search_budget.py --do_search --data_name GSM8K-Zero
We have introduced three different budget estimation methods in our paper.
TALE with Zero-shot Estimator:
python -u TALE.py --data_name GSM8K-Zero --model gpt-4o-mini
TALE with Regression Estimator and Token-Budget Awareness Internalization via Fine-tuning is on the way!
This project is in progress, and the following implementation is coming soon!
@article{han2024token,
title={Token-Budget-Aware LLM Reasoning},
author={Han, Tingxu and Wang, Zhenting and Fang, Chunrong and Zhao, Shiyu and Ma, Shiqing and Chen, Zhenyu},
journal={arXiv preprint arXiv:2412.18547},
year={2024}
}