Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@

## 🚀 News

* [2025-08] We now support training on general multi-step workflows! Please check out examples for [ALFWorld](./docs/sphinx_doc/source/tutorial/example_step_wise.md) and [ReAct](./docs/sphinx_doc/source/tutorial/example_react.md).
* [2025-07] Trinity-RFT v0.2.0 is released.
* [2025-07] We update the [technical report](https://arxiv.org/abs/2505.17826) (arXiv v2) with new features, examples, and experiments.
* [2025-06] Trinity-RFT v0.1.1 is released.
Expand Down Expand Up @@ -230,7 +231,7 @@ huggingface-cli download {model_name} --local-dir $MODEL_PATH/{model_name}
modelscope download {model_name} --local_dir $MODEL_PATH/{model_name}
```

For more details about model downloading, see [Huggingface](https://huggingface.co/docs/huggingface_hub/main/en/guides/cli) or [ModelScope](https://modelscope.cn/docs/models/download).
For more details about model downloading, see [Huggingface](https://huggingface.co/docs/huggingface_hub/main/en/guides/cli) or [ModelScope](https://modelscope.cn/docs/models/download).



Expand Down Expand Up @@ -331,7 +332,12 @@ Tutorials for running different RFT modes:

Tutorials for adapting Trinity-RFT to a new multi-turn agentic scenario:

+ [Multi-turn tasks](./docs/sphinx_doc/source/tutorial/example_multi_turn.md)
+ [Concatenated Multi-turn tasks](./docs/sphinx_doc/source/tutorial/example_multi_turn.md)

Tutorials for adapting Trinity-RFT to a general multi-step agentic scenario:

+ [General Multi-Step tasks](./docs/sphinx_doc/source/tutorial/example_step_wise.md)
+ [ReAct agent tasks](./docs/sphinx_doc/source/tutorial/example_react.md)


Tutorials for data-related functionalities:
Expand Down
7 changes: 7 additions & 0 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@

## 🚀 最新动态

* [2025-08] Trinity-RFT 现在已经支持通用多轮工作流的训练了,请参考 [ALFWorld](./docs/sphinx_doc/source/tutorial/example_step_wise.md) 和 [ReAct](./docs/sphinx_doc/source/tutorial/example_react.md) 的例子!
* [2025-07] 发布 Trinity-RFT v0.2.0 版本,新增了多项功能优化。
* [2025-07] 更新了[技术报告](https://arxiv.org/abs/2505.17826) (arXiv v2),增加了新功能、示例和实验。
* [2025-06] 发布 Trinity-RFT v0.1.1 版本,修复了已知问题并提升系统稳定性。
Expand Down Expand Up @@ -334,6 +335,12 @@ trinity run --config examples/grpo_gsm8k/gsm8k.yaml
+ [多轮任务](./docs/sphinx_doc/source/tutorial/example_multi_turn.md)


将 Trinity-RFT 适配到通用多轮智能体场景的教程:

+ [通用多轮任务](./docs/sphinx_doc/source/tutorial/example_step_wise.md)
+ [ReAct智能体任务](./docs/sphinx_doc/source/tutorial/example_react.md)


数据相关功能的教程:

+ [高级数据处理及Human-in-the-loop](./docs/sphinx_doc/source/tutorial/example_data_functionalities.md)
Expand Down
Binary file added docs/sphinx_doc/assets/alfworldv2_reward.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions docs/sphinx_doc/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ Welcome to Trinity-RFT's documentation!
tutorial/example_reasoning_advanced.md
tutorial/example_async_mode.md
tutorial/example_multi_turn.md
tutorial/example_step_wise.md
tutorial/example_react.md
tutorial/example_dpo.md
tutorial/example_data_functionalities.md

Expand Down
8 changes: 7 additions & 1 deletion docs/sphinx_doc/source/main.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

## 🚀 News

* [2025-08] We now support training on general multi-step workflows! Please check out examples for [ALFWorld](./docs/sphinx_doc/source/tutorial/example_step_wise.md) and [ReAct](./docs/sphinx_doc/source/tutorial/example_react.md).
* [2025-07] Trinity-RFT v0.2.0 is released.
* [2025-07] We update the [technical report](https://arxiv.org/abs/2505.17826) (arXiv v2) with new features, examples, and experiments.
* [2025-06] Trinity-RFT v0.1.1 is released.
Expand Down Expand Up @@ -309,7 +310,12 @@ Tutorials for running different RFT modes:

Tutorials for adapting Trinity-RFT to a new multi-turn agentic scenario:

+ [Multi-turn tasks](/tutorial/example_multi_turn.md)
+ [Concatenated Multi-turn tasks](./docs/sphinx_doc/source/tutorial/example_multi_turn.md)

Tutorials for adapting Trinity-RFT to a general multi-step agentic scenario:

+ [General Multi-Step tasks](./docs/sphinx_doc/source/tutorial/example_step_wise.md)
+ [ReAct agent tasks](./docs/sphinx_doc/source/tutorial/example_react.md)


Tutorials for data-related functionalities:
Expand Down
2 changes: 1 addition & 1 deletion docs/sphinx_doc/source/tutorial/example_multi_turn.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Multi-Turn RFT
# Concatenated Multi-Turn RFT

In Trinity-RFT, we support Agentic RL with multiple rounds of interaction with environments.

Expand Down
136 changes: 136 additions & 0 deletions docs/sphinx_doc/source/tutorial/example_react.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@

# Multi-Step ReAct

This example serves as a demonstration for adapting the Trinity-RFT training workflow to your own agentic project, through our OpenAI-compatible `ModelWrapper` class.

Here, we use the [AgentScope](https://github.com/modelscope/agentscope) framework as an example, but you can certainly use any other framework, as Trinity offers great flexibility. This example fine-tunes a model on the GSM8K math dataset by leveraging an agent that uses ReAct-style reasoning with native tool calls.

## Key Features Demonstrated

This example highlights several advanced capabilities of the Trinity-RFT framework:

### Seamless Integration with External Agent Frameworks
Trinity-RFT is designed to be highly modular. You can easily embed complex, pre-existing agent logic from external frameworks like AgentScope directly into a Trinity `Workflow`.

- **No Need for Rewrites**: You don't have to re-implement the intricate logic of your agent (e.g., the ReAct loop, memory management, or tool invocation) within Trinity.
- **Focus on High-Level Orchestration**: As shown in our `AgentScopeReactV2MathWorkflow`, the Trinity workflow simply initializes and calls the external agent's `reply` method. Trinity abstracts away the underlying complexity, allowing you to focus on the high-level task orchestration and reward design.

### General Multi-Step Training
Modern agentic tasks often involve multiple steps of reasoning, tool use, and observation. Trinity-RFT natively supports training across these Multi-Step interactions.

- **Step-Wise Experience Generation**: Instead of only learning from the final answer, Trinity can treat each step within an agent's reasoning trajectory as a distinct learning opportunity.
- **Credit Assignment**: The reward for solving a task is propagated back to all experiences within the successful trajectory, enabling the model to learn the entire reasoning chain, not just the final response. This is controlled by the `add_strategy` in the config.

### Native Tool Calling Support
Trinity-RFT's inference engine and training pipeline are built to support the native OpenAI `tool_calls` format.

- **Direct Training on Tool Use**: The framework allows the model to be trained on deciding *when* to call a tool, *which* tool to call, and *what* arguments to use, all formatted in the standard `tool_calls` convention.
- **Interoperability**: This native support ensures seamless integration with any service or environment that consumes the OpenAI API format, such as an `MCP_server` (Multi-Agent Collaboration Platform) or other tool-use evaluators.

## How It Works

Below we show you how to perform this step-by-step.

### The Workflow (`workflow.py`)

The core logic is encapsulated in the `AgentScopeReactV2MathWorkflow` class.

1. **Initialization (`__init__`)**:
- It first initializes the AgentScope environment and the desired agent (`ReActAgentV2`).
- The most critical integration step is injecting Trinity's model client into the AgentScope agent:
```python
self.openai_client = model.get_openai_client()
# ...
self.agent.model.client = self.openai_client
```
This ensures that all API calls made by the AgentScope agent are routed through Trinity's `ModelWrapper`, which records the entire conversation history.

2. **Execution (`run`)**:
- The `run` method is remarkably simple. It just passes the task description to the agent.
```python
content = self.agent.reply(msg).content
```
- After the agent completes its multi-step reasoning and produces a final answer, Trinity extracts all the intermediate turns from the model's history:
```python
experiences = self.model.extract_experience_from_history(clear_history=True)
```
- A reward is calculated based on the final answer and is applied to all `Experience` objects generated from the trajectory. These experiences are then sent to the buffer for training.

### Configuration

The configuration file fine-tunes the behavior of the entire system. Here are the key parameters for this example:

#### Native Tool Calling Settings

These settings in the `explorer.rollout_model` section configure the VLLM-based engine to generate and parse OpenAI-compatible tool calls.
We use the `Qwen3` model and host model with vllm. The configuration for different model can be found in [VLLM Toolcalls](https://docs.vllm.ai/en/stable/features/tool_calling.html#qwen-models)


```yaml
explorer:
rollout_model:
# ...
enable_auto_tool_choice: true # Enables the model to generate `tool_calls`
tool_call_parser: hermes # Specifies the parser for formatting tool call outputs
reasoning_parser: deepseek_r1 # Helps in parsing the model's thought process
enable_thinking: true # Enables the model to generate intermediate "thoughts"
```

#### Multi-Step Training Strategy

This setting in the `algorithm` section defines how experiences from a Multi-Step rollout are processed.

```yaml
algorithm:
algorithm_type: grpo
add_strategy: step_wise_grpo # Key for Multi-Step training
```
- `step_wise_grpo`: This strategy tells Trinity to create a distinct training sample for each step in the agent's execution path. The `grpo` algorithm then uses these samples to update the model.

#### Asynchronous Synchronization for Efficiency

Because Multi-Step rollouts produce a variable number of experiences, waiting for a fixed number of *rollouts* is inefficient. We use a dynamic synchronization strategy.

```yaml
synchronizer:
sync_style: dynamic_by_explorer # Start training when enough experiences are ready
sync_interval: 2
```
- `sync_style: dynamic_by_explorer`: The trainer starts a training job as soon as the buffer has collected enough *experiences* (i.e., individual turns), rather than waiting for a fixed number of full agent trajectories. This significantly improves GPU utilization and training throughput.

## How to Run the Example

1. **Prerequisites**: Ensure you have Trinity installed, along with the dependencies for this example (e.g., `agentscope`). Please refer to [Agentscope Github link](https://github.com/modelscope/agentscope).

2. Download the model you want to use, and fill in the configuration files in `examples/agentscope_tool_react/agentscope_tool_react_gsm8k.yaml` or `examples/agentscope_tool_react/agentscope_tool_react_dapo.yaml`

3. **Launch the training job**: Run the following command from the root directory of the repository.

```bash
trinity run --config examples/agentscope_tool_react/agentscope_tool_react_gsm8k.yaml
```

or

```bash
trinity run --config examples/agentscope_tool_react/agentscope_tool_react_dapo.yaml
```


The example here for gsm8k dataset is really simple and it can converge in a few minutes on 8 H20 GPUs.

![](../../assets/agentscope_gsm8k_reward.png)

The example here for dapo dataset take a little bit longer, but it also converges.

![](../../assets/agentscope_dapo_reward.png)

We can also see that the model generally start to use more tool calls to solve the problems.

![](../../assets/agentscope_dapo_turns.png)



## Summary

This example is simple but demonstrates the power and flexibility of Trinity for training complex, Multi-Step agents that use tools. By seamlessly integrating external agentic logic and providing native support for Multi-Step training and tool calls, Trinity-RFT empowers you to fine-tune models on sophisticated, realistic tasks with high efficiency.
Loading