agentscope-ai · pan-x-c · Sep 15, 2025 · Sep 15, 2025 · Sep 15, 2025 · Sep 15, 2025
diff --git a/docs/sphinx_doc/assets/agentscope_dapo_qwen3-4B_reward.png b/docs/sphinx_doc/assets/agentscope_dapo_qwen3-4B_reward.png
diff --git a/docs/sphinx_doc/source/tutorial/example_react.md b/docs/sphinx_doc/source/tutorial/example_react.md
@@ -33,13 +33,14 @@ Below we show you how to perform this step-by-step.
 
 ### The Workflow (`workflow.py`)
 
-The core logic is encapsulated in the `AgentScopeReactV2MathWorkflow` class.
+The core logic is encapsulated in the `AgentScopeReactMathWorkflow` class.
 
 1.  **Initialization (`__init__`)**:
-    - It first initializes the AgentScope environment and the desired agent (`ReActAgentV2`).
+    - It first initializes the AgentScope environment and the desired agent (`ReActAgent`).
     - The most critical integration step is injecting Trinity's model client into the AgentScope agent:
       ```python
       self.openai_client = model.get_openai_client()
+      # self.openai_client = get_openai_async_client() # or async client depend on whether you are using async openai client
       # ...
       self.agent.model.client = self.openai_client
       ```
@@ -48,7 +49,7 @@ The core logic is encapsulated in the `AgentScopeReactV2MathWorkflow` class.
 2.  **Execution (`run`)**:
     - The `run` method is remarkably simple. It just passes the task description to the agent.
       ```python
-      content = self.agent.reply(msg).content
+      content = self.agent.reply(msg).content # your agent logic
       ```
     - After the agent completes its multi-step reasoning and produces a final answer, Trinity extracts all the intermediate turns from the model's history:
       ```python
@@ -106,18 +107,18 @@ synchronizer:
 >  - Commit: `ad13ed5dacecb79d20abf626769f8c7d7a7d2afb`
 >  - Branch: [`v0`](https://github.com/agentscope-ai/agentscope/tree/v0)
 
-2. Download the model you want to use, and fill in the configuration files in `examples/agentscope_tool_react/agentscope_tool_react_gsm8k.yaml` or `examples/agentscope_tool_react/agentscope_tool_react_dapo.yaml`
+2. Download the model you want to use, and fill in the configuration files in `examples/agentscope_tool_react/agentscopev0_tool_react_gsm8k.yaml` or `examples/agentscope_tool_react/agentscopev0_tool_react_dapo.yaml`
 
 3.  **Launch the training job**: Run the following command from the root directory of the repository.
 
     ```bash
-    trinity run --config examples/agentscope_tool_react/agentscope_tool_react_gsm8k.yaml
+    trinity run --config examples/agentscope_tool_react/agentscopev0_tool_react_gsm8k.yaml
     ```
 
     or
 
     ```bash
-    trinity run --config examples/agentscope_tool_react/agentscope_tool_react_dapo.yaml
+    trinity run --config examples/agentscope_tool_react/agentscopev0_tool_react_dapo.yaml
     ```
 
 
@@ -133,7 +134,9 @@ We can also see that the model generally start to use more tool calls to solve t
 
 ![](../../assets/agentscope_dapo_turns.png)
 
+We can also update the agentscope version to v1, and training on the qwen3-4b-instrcut-2507
 
+![](../../assets/agentscope_dapo_qwen3-4B_reward.png)
 
 ## Summary
 

diff --git a/examples/agentscope_tool_react/README.md b/examples/agentscope_tool_react/README.md
@@ -4,4 +4,4 @@ This example shows how to train a ReAct agent for tool integrated reasoning on G
 
 For more detailed information, please refer to the [documentation](../../docs/sphinx_doc/source/tutorial/example_react.md).
 
-The config file is located in [`agentscope_tool_react_dapo.yaml`](agentscope_tool_react_dapo.yaml).
+The config file is located in [`agentscopev0_tool_react_dapo.yaml`](agentscopev0_tool_react_dapo.yaml) and [`agentscopev1_tool_react_dapo.yaml`](agentscopev1_tool_react_dapo.yaml), depending on the agentscope version you installed.
diff --git a/...ool_react/agentscope_tool_react_dapo.yaml → ...l_react/agentscopev0_tool_react_dapo.yaml b/...ool_react/agentscope_tool_react_dapo.yaml → ...l_react/agentscopev0_tool_react_dapo.yaml
@@ -29,7 +29,7 @@ buffer:
       rollout_args:
         temperature: 1.0
     eval_tasksets: []
-    default_workflow_type: 'agentscope_reactv2_math_workflow'
+    default_workflow_type: 'agentscopev0_react_math_workflow'
   trainer_input:
     experience_buffer:
       name: agentscope_dapo_buffer

diff --git a/...ol_react/agentscope_tool_react_gsm8k.yaml → ..._react/agentscopev0_tool_react_gsm8k.yaml b/...ol_react/agentscope_tool_react_gsm8k.yaml → ..._react/agentscopev0_tool_react_gsm8k.yaml
@@ -29,7 +29,7 @@ buffer:
       rollout_args:
         temperature: 1.0
     eval_tasksets: []
-    default_workflow_type: 'agentscope_reactv2_math_workflow'
+    default_workflow_type: 'agentscopev0_react_math_workflow'
   trainer_input:
     experience_buffer:
       name: agentscope_gsm8k_buffer

diff --git a/examples/agentscope_tool_react/agentscopev1_tool_react_dapo.yaml b/examples/agentscope_tool_react/agentscopev1_tool_react_dapo.yaml
@@ -0,0 +1,73 @@
+project: "Trinity-RFT-dapo-react"
+name: "Qwen3-4B-dapo-react"
+checkpoint_root_dir: ${oc.env:TRINITY_CHECKPOINT_ROOT_DIR,./checkpoints}
+algorithm:
+  algorithm_type: grpo
+  repeat_times: 8
+  advantage_fn: step_wise_grpo
+model:
+  model_path: ${oc.env:TRINITY_MODEL_PATH,Qwen/Qwen3-4B-Instruct-2507}
+  max_response_tokens: 16384
+  max_model_len: 24576
+cluster:
+  node_num: 1
+  gpu_per_node: 8
+buffer:
+  total_epochs: 1
+  batch_size: 32
+  train_batch_size: 512
+  explorer_input:
+    taskset:
+      name: dapo
+      storage_type: file
+      path: open-r1/DAPO-Math-17k-Processed
+      subset_name: en
+      split: train
+      format:
+        prompt_key: 'prompt'
+        response_key: 'solution'
+      rollout_args:
+        temperature: 1.0
+    eval_tasksets: []
+    default_workflow_type: 'agentscope_react_math_workflow'
+  trainer_input:
+    experience_buffer:
+      name: agentscope_dapo_buffer
+      storage_type: queue
+explorer:
+  max_repeat_times_per_runner: 1
+  eval_interval: 50
+  runner_per_model: 8
+  max_timeout: 360
+  rollout_model:
+    engine_num: 4
+    tensor_parallel_size: 1
+    enable_prefix_caching: false
+    enforce_eager: true
+    enable_openai_api: true
+    enable_history: true
+    dtype: bfloat16
+    seed: 42
+    enable_auto_tool_choice: true
+    tool_call_parser: hermes
+synchronizer:
+  sync_style: dynamic_by_explorer
+  sync_method: 'nccl'
+  sync_interval: 2
+  sync_timeout: 1200
+trainer:
+  save_interval: 100
+  trainer_config:
+    actor_rollout_ref:
+      model:
+        use_remove_padding: true
+      actor:
+        use_dynamic_bsz: true
+        ppo_max_token_len_per_gpu: 24576
+        ulysses_sequence_parallel_size: 2 # sp size
+      ref:
+        log_prob_use_dynamic_bsz: ${trainer.trainer_config.actor_rollout_ref.actor.use_dynamic_bsz}
+        log_prob_max_token_len_per_gpu: ${trainer.trainer_config.actor_rollout_ref.actor.ppo_max_token_len_per_gpu}
+        ulysses_sequence_parallel_size: ${trainer.trainer_config.actor_rollout_ref.actor.ulysses_sequence_parallel_size} # sp size
+monitor:
+  monitor_type: wandb
diff --git a/trinity/common/workflows/__init__.py b/trinity/common/workflows/__init__.py
@@ -2,7 +2,10 @@
 """Workflow module"""
 from .customized_math_workflows import MathBoxedWorkflow
 from .customized_toolcall_workflows import ToolCallWorkflow
-from .envs.agentscope.agentscope_react_workflow import AgentScopeReactV2MathWorkflow
+from .envs.agentscope.agentscopev0_react_workflow import (  # will be deprecated soon
+    AgentScopeV0ReactMathWorkflow,
+)
+from .envs.agentscope.agentscopev1_react_workflow import AgentScopeReactMathWorkflow
 from .envs.alfworld.alfworld_workflow import AlfworldWorkflow, StepWiseAlfworldWorkflow
 from .envs.alfworld.RAFT_alfworld_workflow import RAFTAlfworldWorkflow
 from .envs.alfworld.RAFT_reflect_alfworld_workflow import RAFTReflectAlfworldWorkflow
@@ -31,7 +34,8 @@
     "MathRMWorkflow",
     "ToolCallWorkflow",
     "MathEvalWorkflow",
-    "AgentScopeReactV2MathWorkflow",
+    "AgentScopeV0ReactMathWorkflow",  # will be deprecated soon
+    "AgentScopeReactMathWorkflow",
     "EmailSearchWorkflow",
     "MathRULERWorkflow",
     "SimpleMMWorkflow",

diff --git a/...s/agentscope/agentscope_react_workflow.py → ...agentscope/agentscopev0_react_workflow.py b/...s/agentscope/agentscope_react_workflow.py → ...agentscope/agentscopev0_react_workflow.py
@@ -8,12 +8,15 @@
 from trinity.common.models.model import ModelWrapper
 from trinity.common.rewards.math_reward import MathBoxedRewardFn
 from trinity.common.workflows.workflow import WORKFLOWS, Task, Workflow
+from trinity.utils.annotations import Deprecated
 
 
-@WORKFLOWS.register_module("agentscope_reactv2_math_workflow")
-class AgentScopeReactV2MathWorkflow(Workflow):
+@Deprecated
+@WORKFLOWS.register_module("agentscopev0_react_math_workflow")
+class AgentScopeV0ReactMathWorkflow(Workflow):
     """
     This workflow serves as an example of how to use the agentscope framework within the trinity workflow.
+    We use the AgentScope V0 version here. The code will be deprecated soon.
     """
 
     def __init__(
@@ -23,6 +26,11 @@ def __init__(
         model: ModelWrapper,
         auxiliary_models: Optional[List[openai.OpenAI]] = None,
     ):
+        super().__init__(
+            task=task,
+            model=model,
+            auxiliary_models=auxiliary_models,
+        )
         # make sure that we have the correct import
         try:
             import agentscope
@@ -35,11 +43,6 @@ def __init__(
         # get openai client from model
         self.openai_client = model.get_openai_client()
         self.model_name = self.openai_client.model_path
-        super().__init__(
-            task=task,
-            model=model,
-            auxiliary_models=auxiliary_models,
-        )
 
         temperature = self.rollout_args.get("temperature", 1.0)
         max_tokens = self.rollout_args.get("max_tokens", 4096)

diff --git a/trinity/common/workflows/envs/agentscope/agentscopev1_react_workflow.py b/trinity/common/workflows/envs/agentscope/agentscopev1_react_workflow.py
@@ -0,0 +1,172 @@
+# -*- coding: utf-8 -*-
+"""We include the customized math workflows in this file."""
+
+from typing import List, Optional
+
+import openai
+
+from trinity.common.models.model import ModelWrapper
+from trinity.common.rewards.math_reward import MathBoxedRewardFn
+from trinity.common.workflows.workflow import WORKFLOWS, Task, Workflow
+
+
+@WORKFLOWS.register_module("agentscope_react_math_workflow")
+class AgentScopeReactMathWorkflow(Workflow):
+    """
+    This workflow serves as an example of how to use the agentscope framework within the trinity workflow.
+    We use the AgentScope V1 version here.
+    """
+
+    def __init__(
+        self,
+        *,
+        task: Task,
+        model: ModelWrapper,
+        auxiliary_models: Optional[List[openai.OpenAI]] = None,
+    ):
+        super().__init__(
+            task=task,
+            model=model,
+            auxiliary_models=auxiliary_models,
+        )
+        # make sure that we have the correct import
+        try:
+            from agentscope.formatter import OpenAIChatFormatter
+            from agentscope.model import OpenAIChatModel
+        except ImportError as e:
+            error_message = f"AgentScope is not installed. Please install the agentscope framework first before running the workflow. Error: {str(e)}"
+            self.logger.error(error_message)
+            raise ImportError(error_message)
+
+        # get openai client from model
+        self.openai_async_client = model.get_openai_async_client()
+        self.model_name = self.openai_async_client.model_path
+
+        temperature = self.rollout_args.get("temperature", 1.0)
+        max_tokens = self.rollout_args.get("max_tokens", 4096)
+        self.agent_model = OpenAIChatModel(
+            api_key="EMPTY",
+            model_name=self.model_name,
+            stream=False,
+            generate_kwargs={
+                "temperature": temperature,
+                "max_tokens": max_tokens,
+            },
+        )
+        self.agent_model.client = self.openai_async_client
+        self.agent_model_formatter = OpenAIChatFormatter()
+        self.reset(task)
+
+    @property
+    def resettable(self):
+        return True
+
+    def reset(self, task: Task):
+        self.system_prompt = """
+You are an agent specialized in solving math problems with tools. Please solve the math problem given to you. You can write and execute Python code to perform calculation or verify your answer. You should return your final answer within \\boxed{{}}.
+"""
+        try:
+            from agentscope.agent import ReActAgent
+            from agentscope.memory import InMemoryMemory
+            from agentscope.tool import Toolkit, execute_python_code
+        except ImportError as e:
+            error_message = f"AgentScope is not installed. Please install the agentscope framework first before running the workflow. Error: {str(e)}"
+            self.logger.error(error_message)
+            raise ImportError(error_message)
+        self.toolkit = Toolkit()
+        self.toolkit.register_tool_function(execute_python_code)
+        self.agent = ReActAgent(
+            name="math_react_agent",
+            sys_prompt=self.system_prompt,
+            model=self.agent_model,
+            formatter=self.agent_model_formatter,
+            toolkit=self.toolkit,
+            memory=InMemoryMemory(),
+        )
+        # we set the openai client to the agent's model
+        self.agent.model.client = self.openai_async_client
+
+        self.raw_task = task.raw_task
+        self.task_desc = task.task_desc
+        self.truth = task.truth
+
+        # we get the answer from gsm8k dataset
+        try:
+            if isinstance(self.truth, str) and "####" in self.truth:
+                # GSM8K dataset
+                self.answer = self.truth.split("####")[1].strip()
+            else:
+                self.answer = str(self.truth)
+        except Exception as e:
+            self.logger.debug(f"Error in getting answer from truth: {str(e)}")
+            self.answer = str(self.truth)
+
+        # we use the boxed format to evaluate the answer
+        self.reward_fn = MathBoxedRewardFn()
+
+    @property
+    def repeatable(self):
+        return False
+
+    @property
+    def asynchronous(self):
+        """Whether the workflow runs in async mode."""
+        return True
+
+    async def run_async(self):
+        # make sure that we have the correct import
+        try:
+            from agentscope.message import Msg
+            from pydantic import BaseModel, Field
+        except ImportError as e:
+            error_message = f"AgentScope is not installed. Please install the agentscope framework first before running the workflow. Error: {str(e)}"
+            self.logger.error(error_message)
+            raise ImportError(error_message)
+
+        # provide the task to the react agent
+        msg = Msg("user", self.task_desc, role="user")
+
+        # Note that the main workflow can have arbitrary steps and include different logic
+        class FinalResult(BaseModel):
+            result: str = Field(
+                description="Your solution of the given math problem. Put your final answer in boxed format, e.g., \\boxed{42}"
+            )
+
+        def extract_final_answer(result) -> str:
+            """Extract the final answer from the agent's response."""
+            try:
+                if (
+                    hasattr(result, "metadata")
+                    and isinstance(result.metadata, dict)
+                    and "result" in result.metadata
+                ):
+                    return result.metadata["result"]
+                if hasattr(result, "content"):
+                    if isinstance(result.content, dict) and "result" in result.content:
+                        return result.content["result"]
+                    return str(result.content)
+                return str(result)
+            except Exception as e:
+                self.logger.warning(f"Extract final answer error: {e}. Raw: {result}")
+                return str(result)
+
+        result = await self.agent.reply(msg, structured_model=FinalResult)
+
+        final_answer = extract_final_answer(result)
+
+        reward = self.reward_fn(final_answer, self.answer)
+        reward = sum(reward.values())
+        self.logger.debug(f"Reward: {reward}")
+        experiences = self.model.extract_experience_from_history(clear_history=True)
+        self.logger.debug(f"Experiences extracted len: {len(experiences)}")
+        for i, experience in enumerate(experiences):
+            experience.eid.step = i
+            experience.reward = reward
+            agent_metrics = {"react_memory_length": len(self.agent.memory.content)}
+            if experience.metrics is None:
+                experience.metrics = {}
+            experience.metrics.update(agent_metrics)
+        self.logger.debug(
+            f"return experience len: {len(experiences)}, run_id: {str(experiences[-1].eid.run)}, final step reward: {experiences[-1].reward}"
+        )
+        return experiences
Original file line number	Diff line number	Diff line change
Expand Up		@@ -4,4 +4,4 @@ This example shows how to train a ReAct agent for tool integrated reasoning on G

		For more detailed information, please refer to the [documentation](../../docs/sphinx_doc/source/tutorial/example_react.md).

		The config file is located in [`agentscope_tool_react_dapo.yaml`](agentscope_tool_react_dapo.yaml).
		The config file is located in [`agentscopev0_tool_react_dapo.yaml`](agentscopev0_tool_react_dapo.yaml) and [`agentscopev1_tool_react_dapo.yaml`](agentscopev1_tool_react_dapo.yaml), depending on the agentscope version you installed.