FlowRL

Matching Reward Distributions via Flow Balance

𝕏 Post 1 | 𝕏 Post 2 | 𝕏 Post 3 | 𝕏 Post 4

FlowRL Objective

$$ \mathcal{L}_{\text{FlowRL}} = w \cdot \left( \log Z_{\phi}(x) + \frac{1}{|y|} \log \pi_{\theta}(y \mid x) - \beta \hat{r}(x, y) - \frac{1}{|y|} \log \pi_{\text{ref}}(y \mid x) \right)^2 $$

FlowRL is a flow-balanced reinforcement learning method that matches full reward distributions instead of maximizing rewards, promoting diverse exploration and generalizable reasoning trajectories in LLMs.

Trained Models & Experiment Logs

Base Model	Domain	WandB Logs	Hugging Face Model
Qwen2.5-7B	Math	🔗 View Run	🤗 Model
DeepSeek-7B	Code	🔗 View Run	🤗 Model
Qwen2.5-32B	Math	-	🤗 Model

Quick Start

There are three ways to use FlowRL:

⭐ We recommend using Option 1 as the default choice. Since verl updates frequently, the newest versions may have unstable factors such as training and inference mismatches. Option 1 uses verl 0.4.0, which is stable and has been thoroughly tested with our paper results.

Option 1: Original Paper Reproduction (verl 0.4.0) ⭐ Recommended

For exact reproduction of results from the paper, use the original repository with verl 0.4.0:

👉 Original Code: https://github.com/Xuekai-Zhu/FlowRL

Step 1: Installation

Install verl first before using FlowRL.

Step 2: Data Preparation

# Option A: Download our pre-processed datasets directly
bash preprocess/down_load_dataset.sh
# Move data to default directory
mv data/xuekai/flowrl-data-collection/math_data data/math_data
mv data/xuekai/flowrl-data-collection/code_data data/code_data

# Option B: Process data from original sources
# For detailed processing instructions, see data/README.md

Step 3: Model Preparation

For Math Tasks: Qwen/Qwen2.5-7B (default in script) ; Qwen/Qwen2.5-32B

For Code Tasks: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

# Download default model (Qwen2.5-7B for math)
bash preprocess/down_load_model.sh

# For other models, modify MODEL_NAME in the script before running

Step 4: Training Scripts

cd verl_FlowRL

# For 7B math training
bash command/training/math/flowrl_7B_math.sh

# For 32B math training
bash command/training/math/flowrl_32B_math.sh

# For 7B code training
bash command/training/code/flowrl_7B_code.sh

Option 2: Latest verl Recipe FlowRL

For running FlowRL using the latest verl framework:

Latest verl:

verl recipe: https://github.com/volcengine/verl/tree/main/recipe/flowrl

Step 1: Prepare Data and Model

# Prepare dataset
bash recipe/flowrl/prepare/prepare_data.sh

# Prepare model
bash recipe/flowrl/prepare/prepare_model.sh

Step 2: Run Training

# Train FlowRL with Qwen2.5-7B
bash recipe/flowrl/run_flowrl_qwen2.5_7b.sh

Option 3: Implement FlowRL Yourself

If you want to implement FlowRL in your own codebase, we provide a detailed implementation guide:

📖 FlowRL Implementation Guide

This guide walks you through the key components and steps needed to integrate FlowRL into your existing training pipeline.

Testing

After training your FlowRL models, you can evaluate them using the following commands:

cd verl_Test

# First merge the model
bash command/eval/merge_model.sh

# For math testing
bash command/eval/math/flowrl_math_test.sh

# For code testing
bash command/eval/code/flowrl_code_test.sh

Reference: For verl v0.5.0.dev merge model script, see merge_model.sh

Citation

If you think this repo helps you, please kindly consider citing our paper:

@article{zhu2025flowrl,
  title={FlowRL: Matching Reward Distributions for LLM Reasoning},
  author={Zhu, Xuekai and Cheng, Daixuan and Zhang, Dinghuai and Li, Hengli and Zhang, Kaiyan and Jiang, Che and Sun, Youbang and Hua, Ermo and Zuo, Yuxin and Lv, Xingtai and others},
  journal={arXiv preprint arXiv:2509.15207},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
data		data
figures		figures
preprocess		preprocess
verl_FlowRL		verl_FlowRL
verl_Test		verl_Test
.gitignore		.gitignore
FLOWRL_SIMPLE_GUIDE.md		FLOWRL_SIMPLE_GUIDE.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FlowRL

Table of Contents

FlowRL Objective

Trained Models & Experiment Logs

Quick Start

Option 1: Original Paper Reproduction (verl 0.4.0) ⭐ Recommended

Step 1: Installation

Step 2: Data Preparation

Step 3: Model Preparation

Step 4: Training Scripts

Option 2: Latest verl Recipe FlowRL

Step 1: Prepare Data and Model

Step 2: Run Training

Option 3: Implement FlowRL Yourself

Testing

Citation

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

Xuekai-Zhu/FlowRL

Folders and files

Latest commit

History

Repository files navigation

FlowRL

Table of Contents

FlowRL Objective

Trained Models & Experiment Logs

Quick Start

Option 1: Original Paper Reproduction (verl 0.4.0) ⭐ Recommended

Step 1: Installation

Step 2: Data Preparation

Step 3: Model Preparation

Step 4: Training Scripts

Option 2: Latest verl Recipe FlowRL

Step 1: Prepare Data and Model

Step 2: Run Training

Option 3: Implement FlowRL Yourself

Testing

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages