Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
15 changes: 9 additions & 6 deletions docs/sphinx_doc/source/tutorial/example_react.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,14 @@ Below we show you how to perform this step-by-step.

### The Workflow (`workflow.py`)

The core logic is encapsulated in the `AgentScopeReactV2MathWorkflow` class.
The core logic is encapsulated in the `AgentScopeReactMathWorkflow` class.

1. **Initialization (`__init__`)**:
- It first initializes the AgentScope environment and the desired agent (`ReActAgentV2`).
- It first initializes the AgentScope environment and the desired agent (`ReActAgent`).
- The most critical integration step is injecting Trinity's model client into the AgentScope agent:
```python
self.openai_client = model.get_openai_client()
# self.openai_client = get_openai_async_client() # or async client depend on whether you are using async openai client
# ...
self.agent.model.client = self.openai_client
```
Expand All @@ -48,7 +49,7 @@ The core logic is encapsulated in the `AgentScopeReactV2MathWorkflow` class.
2. **Execution (`run`)**:
- The `run` method is remarkably simple. It just passes the task description to the agent.
```python
content = self.agent.reply(msg).content
content = self.agent.reply(msg).content # your agent logic
```
- After the agent completes its multi-step reasoning and produces a final answer, Trinity extracts all the intermediate turns from the model's history:
```python
Expand Down Expand Up @@ -106,18 +107,18 @@ synchronizer:
> - Commit: `ad13ed5dacecb79d20abf626769f8c7d7a7d2afb`
> - Branch: [`v0`](https://github.com/agentscope-ai/agentscope/tree/v0)

2. Download the model you want to use, and fill in the configuration files in `examples/agentscope_tool_react/agentscope_tool_react_gsm8k.yaml` or `examples/agentscope_tool_react/agentscope_tool_react_dapo.yaml`
2. Download the model you want to use, and fill in the configuration files in `examples/agentscope_tool_react/agentscopev0_tool_react_gsm8k.yaml` or `examples/agentscope_tool_react/agentscopev0_tool_react_dapo.yaml`

3. **Launch the training job**: Run the following command from the root directory of the repository.

```bash
trinity run --config examples/agentscope_tool_react/agentscope_tool_react_gsm8k.yaml
trinity run --config examples/agentscope_tool_react/agentscopev0_tool_react_gsm8k.yaml
```

or

```bash
trinity run --config examples/agentscope_tool_react/agentscope_tool_react_dapo.yaml
trinity run --config examples/agentscope_tool_react/agentscopev0_tool_react_dapo.yaml
```


Expand All @@ -133,7 +134,9 @@ We can also see that the model generally start to use more tool calls to solve t

![](../../assets/agentscope_dapo_turns.png)

We can also update the agentscope version to v1, and training on the qwen3-4b-instrcut-2507

![](../../assets/agentscope_dapo_qwen3-4B_reward.png)

## Summary

Expand Down
2 changes: 1 addition & 1 deletion examples/agentscope_tool_react/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@ This example shows how to train a ReAct agent for tool integrated reasoning on G

For more detailed information, please refer to the [documentation](../../docs/sphinx_doc/source/tutorial/example_react.md).

The config file is located in [`agentscope_tool_react_dapo.yaml`](agentscope_tool_react_dapo.yaml).
The config file is located in [`agentscopev0_tool_react_dapo.yaml`](agentscopev0_tool_react_dapo.yaml) and [`agentscopev1_tool_react_dapo.yaml`](agentscopev1_tool_react_dapo.yaml), depending on the agentscope version you installed.
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ buffer:
rollout_args:
temperature: 1.0
eval_tasksets: []
default_workflow_type: 'agentscope_reactv2_math_workflow'
default_workflow_type: 'agentscopev0_react_math_workflow'
trainer_input:
experience_buffer:
name: agentscope_dapo_buffer
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ buffer:
rollout_args:
temperature: 1.0
eval_tasksets: []
default_workflow_type: 'agentscope_reactv2_math_workflow'
default_workflow_type: 'agentscopev0_react_math_workflow'
trainer_input:
experience_buffer:
name: agentscope_gsm8k_buffer
Expand Down
73 changes: 73 additions & 0 deletions examples/agentscope_tool_react/agentscopev1_tool_react_dapo.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
project: "Trinity-RFT-dapo-react"
name: "Qwen3-4B-dapo-react"
checkpoint_root_dir: ${oc.env:TRINITY_CHECKPOINT_ROOT_DIR,./checkpoints}
algorithm:
algorithm_type: grpo
repeat_times: 8
advantage_fn: step_wise_grpo
model:
model_path: ${oc.env:TRINITY_MODEL_PATH,Qwen/Qwen3-4B-Instruct-2507}
max_response_tokens: 16384
max_model_len: 24576
cluster:
node_num: 1
gpu_per_node: 8
buffer:
total_epochs: 1
batch_size: 32
train_batch_size: 512
explorer_input:
taskset:
name: dapo
storage_type: file
path: open-r1/DAPO-Math-17k-Processed
subset_name: en
split: train
format:
prompt_key: 'prompt'
response_key: 'solution'
rollout_args:
temperature: 1.0
eval_tasksets: []
default_workflow_type: 'agentscope_react_math_workflow'
trainer_input:
experience_buffer:
name: agentscope_dapo_buffer
storage_type: queue
explorer:
max_repeat_times_per_runner: 1
eval_interval: 50
runner_per_model: 8
max_timeout: 360
rollout_model:
engine_num: 4
tensor_parallel_size: 1
enable_prefix_caching: false
enforce_eager: true
enable_openai_api: true
enable_history: true
dtype: bfloat16
seed: 42
enable_auto_tool_choice: true
tool_call_parser: hermes
synchronizer:
sync_style: dynamic_by_explorer
sync_method: 'nccl'
sync_interval: 2
sync_timeout: 1200
trainer:
save_interval: 100
trainer_config:
actor_rollout_ref:
model:
use_remove_padding: true
actor:
use_dynamic_bsz: true
ppo_max_token_len_per_gpu: 24576
ulysses_sequence_parallel_size: 2 # sp size
ref:
log_prob_use_dynamic_bsz: ${trainer.trainer_config.actor_rollout_ref.actor.use_dynamic_bsz}
log_prob_max_token_len_per_gpu: ${trainer.trainer_config.actor_rollout_ref.actor.ppo_max_token_len_per_gpu}
ulysses_sequence_parallel_size: ${trainer.trainer_config.actor_rollout_ref.actor.ulysses_sequence_parallel_size} # sp size
monitor:
monitor_type: wandb
8 changes: 6 additions & 2 deletions trinity/common/workflows/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,10 @@
"""Workflow module"""
from .customized_math_workflows import MathBoxedWorkflow
from .customized_toolcall_workflows import ToolCallWorkflow
from .envs.agentscope.agentscope_react_workflow import AgentScopeReactV2MathWorkflow
from .envs.agentscope.agentscopev0_react_workflow import ( # will be deprecated soon
AgentScopeV0ReactMathWorkflow,
)
from .envs.agentscope.agentscopev1_react_workflow import AgentScopeReactMathWorkflow
from .envs.alfworld.alfworld_workflow import AlfworldWorkflow, StepWiseAlfworldWorkflow
from .envs.alfworld.RAFT_alfworld_workflow import RAFTAlfworldWorkflow
from .envs.alfworld.RAFT_reflect_alfworld_workflow import RAFTReflectAlfworldWorkflow
Expand Down Expand Up @@ -31,7 +34,8 @@
"MathRMWorkflow",
"ToolCallWorkflow",
"MathEvalWorkflow",
"AgentScopeReactV2MathWorkflow",
"AgentScopeV0ReactMathWorkflow", # will be deprecated soon
"AgentScopeReactMathWorkflow",
"EmailSearchWorkflow",
"MathRULERWorkflow",
"SimpleMMWorkflow",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,15 @@
from trinity.common.models.model import ModelWrapper
from trinity.common.rewards.math_reward import MathBoxedRewardFn
from trinity.common.workflows.workflow import WORKFLOWS, Task, Workflow
from trinity.utils.annotations import Deprecated


@WORKFLOWS.register_module("agentscope_reactv2_math_workflow")
class AgentScopeReactV2MathWorkflow(Workflow):
@Deprecated
@WORKFLOWS.register_module("agentscopev0_react_math_workflow")
class AgentScopeV0ReactMathWorkflow(Workflow):
"""
This workflow serves as an example of how to use the agentscope framework within the trinity workflow.
We use the AgentScope V0 version here. The code will be deprecated soon.
"""

def __init__(
Expand All @@ -23,6 +26,11 @@ def __init__(
model: ModelWrapper,
auxiliary_models: Optional[List[openai.OpenAI]] = None,
):
super().__init__(
task=task,
model=model,
auxiliary_models=auxiliary_models,
)
# make sure that we have the correct import
try:
import agentscope
Expand All @@ -35,11 +43,6 @@ def __init__(
# get openai client from model
self.openai_client = model.get_openai_client()
self.model_name = self.openai_client.model_path
super().__init__(
task=task,
model=model,
auxiliary_models=auxiliary_models,
)

temperature = self.rollout_args.get("temperature", 1.0)
max_tokens = self.rollout_args.get("max_tokens", 4096)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
# -*- coding: utf-8 -*-
"""We include the customized math workflows in this file."""

from typing import List, Optional

import openai

from trinity.common.models.model import ModelWrapper
from trinity.common.rewards.math_reward import MathBoxedRewardFn
from trinity.common.workflows.workflow import WORKFLOWS, Task, Workflow


@WORKFLOWS.register_module("agentscope_react_math_workflow")
class AgentScopeReactMathWorkflow(Workflow):
"""
This workflow serves as an example of how to use the agentscope framework within the trinity workflow.
We use the AgentScope V1 version here.
"""

def __init__(
self,
*,
task: Task,
model: ModelWrapper,
auxiliary_models: Optional[List[openai.OpenAI]] = None,
):
super().__init__(
task=task,
model=model,
auxiliary_models=auxiliary_models,
)
# make sure that we have the correct import
try:
from agentscope.formatter import OpenAIChatFormatter
from agentscope.model import OpenAIChatModel
except ImportError as e:
error_message = f"AgentScope is not installed. Please install the agentscope framework first before running the workflow. Error: {str(e)}"
self.logger.error(error_message)
raise ImportError(error_message)

# get openai client from model
self.openai_async_client = model.get_openai_async_client()
self.model_name = self.openai_async_client.model_path

temperature = self.rollout_args.get("temperature", 1.0)
max_tokens = self.rollout_args.get("max_tokens", 4096)
self.agent_model = OpenAIChatModel(
api_key="EMPTY",
model_name=self.model_name,
stream=False,
generate_kwargs={
"temperature": temperature,
"max_tokens": max_tokens,
},
)
self.agent_model.client = self.openai_async_client
self.agent_model_formatter = OpenAIChatFormatter()
self.reset(task)

@property
def resettable(self):
return True

def reset(self, task: Task):
self.system_prompt = """
You are an agent specialized in solving math problems with tools. Please solve the math problem given to you. You can write and execute Python code to perform calculation or verify your answer. You should return your final answer within \\boxed{{}}.
"""
try:
from agentscope.agent import ReActAgent
from agentscope.memory import InMemoryMemory
from agentscope.tool import Toolkit, execute_python_code
except ImportError as e:
error_message = f"AgentScope is not installed. Please install the agentscope framework first before running the workflow. Error: {str(e)}"
self.logger.error(error_message)
raise ImportError(error_message)
self.toolkit = Toolkit()
self.toolkit.register_tool_function(execute_python_code)
self.agent = ReActAgent(
name="math_react_agent",
sys_prompt=self.system_prompt,
model=self.agent_model,
formatter=self.agent_model_formatter,
toolkit=self.toolkit,
memory=InMemoryMemory(),
)
# we set the openai client to the agent's model
self.agent.model.client = self.openai_async_client

self.raw_task = task.raw_task
self.task_desc = task.task_desc
self.truth = task.truth

# we get the answer from gsm8k dataset
try:
if isinstance(self.truth, str) and "####" in self.truth:
# GSM8K dataset
self.answer = self.truth.split("####")[1].strip()
else:
self.answer = str(self.truth)
except Exception as e:
self.logger.debug(f"Error in getting answer from truth: {str(e)}")
self.answer = str(self.truth)

# we use the boxed format to evaluate the answer
self.reward_fn = MathBoxedRewardFn()

@property
def repeatable(self):
return False

@property
def asynchronous(self):
"""Whether the workflow runs in async mode."""
return True

async def run_async(self):
# make sure that we have the correct import
try:
from agentscope.message import Msg
from pydantic import BaseModel, Field
except ImportError as e:
error_message = f"AgentScope is not installed. Please install the agentscope framework first before running the workflow. Error: {str(e)}"
self.logger.error(error_message)
raise ImportError(error_message)

# provide the task to the react agent
msg = Msg("user", self.task_desc, role="user")

# Note that the main workflow can have arbitrary steps and include different logic
class FinalResult(BaseModel):
result: str = Field(
description="Your solution of the given math problem. Put your final answer in boxed format, e.g., \\boxed{42}"
)

def extract_final_answer(result) -> str:
"""Extract the final answer from the agent's response."""
try:
if (
hasattr(result, "metadata")
and isinstance(result.metadata, dict)
and "result" in result.metadata
):
return result.metadata["result"]
if hasattr(result, "content"):
if isinstance(result.content, dict) and "result" in result.content:
return result.content["result"]
return str(result.content)
return str(result)
except Exception as e:
self.logger.warning(f"Extract final answer error: {e}. Raw: {result}")
return str(result)

result = await self.agent.reply(msg, structured_model=FinalResult)

final_answer = extract_final_answer(result)

reward = self.reward_fn(final_answer, self.answer)
reward = sum(reward.values())
self.logger.debug(f"Reward: {reward}")
experiences = self.model.extract_experience_from_history(clear_history=True)
self.logger.debug(f"Experiences extracted len: {len(experiences)}")
for i, experience in enumerate(experiences):
experience.eid.step = i
experience.reward = reward
agent_metrics = {"react_memory_length": len(self.agent.memory.content)}
if experience.metrics is None:
experience.metrics = {}
experience.metrics.update(agent_metrics)
self.logger.debug(
f"return experience len: {len(experiences)}, run_id: {str(experiences[-1].eid.run)}, final step reward: {experiences[-1].reward}"
)
return experiences