-
Notifications
You must be signed in to change notification settings - Fork 55
Dev/update agentscope react example version #275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
pan-x-c
merged 8 commits into
agentscope-ai:main
from
garyzhang99:dev/update_agentscope_react_example_version
Sep 15, 2025
Merged
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
306fade
add agentscope v1 react example
garyzhang99 b275e7b
update reward figure
garyzhang99 b41bc99
update figure for dapo tools
garyzhang99 7e161a3
update config file
garyzhang99 21d7422
rename old v0
garyzhang99 1f4ba28
rename files
garyzhang99 15d5e7f
rename yaml as well
garyzhang99 3886a4c
add both script in readme
garyzhang99 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
73 changes: 73 additions & 0 deletions
73
examples/agentscope_tool_react/agentscopev1_tool_react_dapo.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,73 @@ | ||
| project: "Trinity-RFT-dapo-react" | ||
| name: "Qwen3-4B-dapo-react" | ||
| checkpoint_root_dir: ${oc.env:TRINITY_CHECKPOINT_ROOT_DIR,./checkpoints} | ||
| algorithm: | ||
| algorithm_type: grpo | ||
| repeat_times: 8 | ||
| advantage_fn: step_wise_grpo | ||
| model: | ||
| model_path: ${oc.env:TRINITY_MODEL_PATH,Qwen/Qwen3-4B-Instruct-2507} | ||
| max_response_tokens: 16384 | ||
| max_model_len: 24576 | ||
| cluster: | ||
| node_num: 1 | ||
| gpu_per_node: 8 | ||
| buffer: | ||
| total_epochs: 1 | ||
| batch_size: 32 | ||
| train_batch_size: 512 | ||
| explorer_input: | ||
| taskset: | ||
| name: dapo | ||
| storage_type: file | ||
| path: open-r1/DAPO-Math-17k-Processed | ||
| subset_name: en | ||
| split: train | ||
| format: | ||
| prompt_key: 'prompt' | ||
| response_key: 'solution' | ||
| rollout_args: | ||
| temperature: 1.0 | ||
| eval_tasksets: [] | ||
| default_workflow_type: 'agentscope_react_math_workflow' | ||
| trainer_input: | ||
| experience_buffer: | ||
| name: agentscope_dapo_buffer | ||
| storage_type: queue | ||
| explorer: | ||
| max_repeat_times_per_runner: 1 | ||
| eval_interval: 50 | ||
| runner_per_model: 8 | ||
| max_timeout: 360 | ||
| rollout_model: | ||
| engine_num: 4 | ||
| tensor_parallel_size: 1 | ||
| enable_prefix_caching: false | ||
| enforce_eager: true | ||
| enable_openai_api: true | ||
| enable_history: true | ||
| dtype: bfloat16 | ||
| seed: 42 | ||
| enable_auto_tool_choice: true | ||
| tool_call_parser: hermes | ||
| synchronizer: | ||
| sync_style: dynamic_by_explorer | ||
| sync_method: 'nccl' | ||
| sync_interval: 2 | ||
| sync_timeout: 1200 | ||
| trainer: | ||
| save_interval: 100 | ||
| trainer_config: | ||
| actor_rollout_ref: | ||
| model: | ||
| use_remove_padding: true | ||
| actor: | ||
| use_dynamic_bsz: true | ||
| ppo_max_token_len_per_gpu: 24576 | ||
| ulysses_sequence_parallel_size: 2 # sp size | ||
| ref: | ||
| log_prob_use_dynamic_bsz: ${trainer.trainer_config.actor_rollout_ref.actor.use_dynamic_bsz} | ||
| log_prob_max_token_len_per_gpu: ${trainer.trainer_config.actor_rollout_ref.actor.ppo_max_token_len_per_gpu} | ||
| ulysses_sequence_parallel_size: ${trainer.trainer_config.actor_rollout_ref.actor.ulysses_sequence_parallel_size} # sp size | ||
| monitor: | ||
| monitor_type: wandb |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
172 changes: 172 additions & 0 deletions
172
trinity/common/workflows/envs/agentscope/agentscopev1_react_workflow.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,172 @@ | ||
| # -*- coding: utf-8 -*- | ||
| """We include the customized math workflows in this file.""" | ||
|
|
||
| from typing import List, Optional | ||
|
|
||
| import openai | ||
|
|
||
| from trinity.common.models.model import ModelWrapper | ||
| from trinity.common.rewards.math_reward import MathBoxedRewardFn | ||
| from trinity.common.workflows.workflow import WORKFLOWS, Task, Workflow | ||
|
|
||
|
|
||
| @WORKFLOWS.register_module("agentscope_react_math_workflow") | ||
| class AgentScopeReactMathWorkflow(Workflow): | ||
| """ | ||
| This workflow serves as an example of how to use the agentscope framework within the trinity workflow. | ||
| We use the AgentScope V1 version here. | ||
| """ | ||
|
|
||
| def __init__( | ||
| self, | ||
| *, | ||
| task: Task, | ||
| model: ModelWrapper, | ||
| auxiliary_models: Optional[List[openai.OpenAI]] = None, | ||
| ): | ||
| super().__init__( | ||
| task=task, | ||
| model=model, | ||
| auxiliary_models=auxiliary_models, | ||
| ) | ||
| # make sure that we have the correct import | ||
| try: | ||
| from agentscope.formatter import OpenAIChatFormatter | ||
| from agentscope.model import OpenAIChatModel | ||
| except ImportError as e: | ||
| error_message = f"AgentScope is not installed. Please install the agentscope framework first before running the workflow. Error: {str(e)}" | ||
| self.logger.error(error_message) | ||
| raise ImportError(error_message) | ||
garyzhang99 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| # get openai client from model | ||
| self.openai_async_client = model.get_openai_async_client() | ||
| self.model_name = self.openai_async_client.model_path | ||
|
|
||
| temperature = self.rollout_args.get("temperature", 1.0) | ||
| max_tokens = self.rollout_args.get("max_tokens", 4096) | ||
| self.agent_model = OpenAIChatModel( | ||
| api_key="EMPTY", | ||
| model_name=self.model_name, | ||
| stream=False, | ||
| generate_kwargs={ | ||
| "temperature": temperature, | ||
| "max_tokens": max_tokens, | ||
| }, | ||
| ) | ||
| self.agent_model.client = self.openai_async_client | ||
| self.agent_model_formatter = OpenAIChatFormatter() | ||
| self.reset(task) | ||
|
|
||
| @property | ||
| def resettable(self): | ||
| return True | ||
|
|
||
| def reset(self, task: Task): | ||
| self.system_prompt = """ | ||
| You are an agent specialized in solving math problems with tools. Please solve the math problem given to you. You can write and execute Python code to perform calculation or verify your answer. You should return your final answer within \\boxed{{}}. | ||
| """ | ||
garyzhang99 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| try: | ||
| from agentscope.agent import ReActAgent | ||
| from agentscope.memory import InMemoryMemory | ||
| from agentscope.tool import Toolkit, execute_python_code | ||
| except ImportError as e: | ||
| error_message = f"AgentScope is not installed. Please install the agentscope framework first before running the workflow. Error: {str(e)}" | ||
| self.logger.error(error_message) | ||
| raise ImportError(error_message) | ||
| self.toolkit = Toolkit() | ||
| self.toolkit.register_tool_function(execute_python_code) | ||
| self.agent = ReActAgent( | ||
| name="math_react_agent", | ||
| sys_prompt=self.system_prompt, | ||
| model=self.agent_model, | ||
| formatter=self.agent_model_formatter, | ||
| toolkit=self.toolkit, | ||
| memory=InMemoryMemory(), | ||
| ) | ||
| # we set the openai client to the agent's model | ||
| self.agent.model.client = self.openai_async_client | ||
|
|
||
| self.raw_task = task.raw_task | ||
| self.task_desc = task.task_desc | ||
| self.truth = task.truth | ||
|
|
||
| # we get the answer from gsm8k dataset | ||
| try: | ||
| if isinstance(self.truth, str) and "####" in self.truth: | ||
| # GSM8K dataset | ||
| self.answer = self.truth.split("####")[1].strip() | ||
| else: | ||
| self.answer = str(self.truth) | ||
| except Exception as e: | ||
| self.logger.debug(f"Error in getting answer from truth: {str(e)}") | ||
| self.answer = str(self.truth) | ||
|
|
||
| # we use the boxed format to evaluate the answer | ||
| self.reward_fn = MathBoxedRewardFn() | ||
|
|
||
| @property | ||
| def repeatable(self): | ||
| return False | ||
|
|
||
| @property | ||
| def asynchronous(self): | ||
| """Whether the workflow runs in async mode.""" | ||
| return True | ||
|
|
||
| async def run_async(self): | ||
| # make sure that we have the correct import | ||
| try: | ||
| from agentscope.message import Msg | ||
| from pydantic import BaseModel, Field | ||
| except ImportError as e: | ||
| error_message = f"AgentScope is not installed. Please install the agentscope framework first before running the workflow. Error: {str(e)}" | ||
| self.logger.error(error_message) | ||
| raise ImportError(error_message) | ||
|
|
||
| # provide the task to the react agent | ||
| msg = Msg("user", self.task_desc, role="user") | ||
|
|
||
| # Note that the main workflow can have arbitrary steps and include different logic | ||
| class FinalResult(BaseModel): | ||
| result: str = Field( | ||
| description="Your solution of the given math problem. Put your final answer in boxed format, e.g., \\boxed{42}" | ||
| ) | ||
|
|
||
| def extract_final_answer(result) -> str: | ||
| """Extract the final answer from the agent's response.""" | ||
| try: | ||
| if ( | ||
| hasattr(result, "metadata") | ||
| and isinstance(result.metadata, dict) | ||
| and "result" in result.metadata | ||
| ): | ||
| return result.metadata["result"] | ||
| if hasattr(result, "content"): | ||
| if isinstance(result.content, dict) and "result" in result.content: | ||
| return result.content["result"] | ||
| return str(result.content) | ||
| return str(result) | ||
| except Exception as e: | ||
| self.logger.warning(f"Extract final answer error: {e}. Raw: {result}") | ||
| return str(result) | ||
garyzhang99 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| result = await self.agent.reply(msg, structured_model=FinalResult) | ||
|
|
||
| final_answer = extract_final_answer(result) | ||
|
|
||
| reward = self.reward_fn(final_answer, self.answer) | ||
| reward = sum(reward.values()) | ||
| self.logger.debug(f"Reward: {reward}") | ||
| experiences = self.model.extract_experience_from_history(clear_history=True) | ||
| self.logger.debug(f"Experiences extracted len: {len(experiences)}") | ||
| for i, experience in enumerate(experiences): | ||
| experience.eid.step = i | ||
| experience.reward = reward | ||
| agent_metrics = {"react_memory_length": len(self.agent.memory.content)} | ||
| if experience.metrics is None: | ||
| experience.metrics = {} | ||
| experience.metrics.update(agent_metrics) | ||
| self.logger.debug( | ||
| f"return experience len: {len(experiences)}, run_id: {str(experiences[-1].eid.run)}, final step reward: {experiences[-1].reward}" | ||
| ) | ||
| return experiences | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.