diff --git a/README.md b/README.md
index eee0b6ca45..bb90d042db 100644
--- a/README.md
+++ b/README.md
@@ -22,6 +22,7 @@
 
 ## 🚀 News
 
+* [2025-08] We now support training on general multi-step workflows! Please check out examples for [ALFWorld](./docs/sphinx_doc/source/tutorial/example_step_wise.md) and [ReAct](./docs/sphinx_doc/source/tutorial/example_react.md).
 * [2025-07] Trinity-RFT v0.2.0 is released.
 * [2025-07] We update the [technical report](https://arxiv.org/abs/2505.17826) (arXiv v2) with new features, examples, and experiments.
 * [2025-06] Trinity-RFT v0.1.1 is released.
@@ -230,7 +231,7 @@ huggingface-cli download {model_name} --local-dir $MODEL_PATH/{model_name}
 modelscope download {model_name} --local_dir $MODEL_PATH/{model_name}
 ```
 
-For more details about model downloading, see [Huggingface](https://huggingface.co/docs/huggingface_hub/main/en/guides/cli) or  [ModelScope](https://modelscope.cn/docs/models/download).
+For more details about model downloading, see [Huggingface](https://huggingface.co/docs/huggingface_hub/main/en/guides/cli) or [ModelScope](https://modelscope.cn/docs/models/download).
 
 
 
@@ -331,7 +332,12 @@ Tutorials for running different RFT modes:
 
 Tutorials for adapting Trinity-RFT to a new multi-turn agentic scenario:
 
-+ [Multi-turn tasks](./docs/sphinx_doc/source/tutorial/example_multi_turn.md)
++ [Concatenated Multi-turn tasks](./docs/sphinx_doc/source/tutorial/example_multi_turn.md)
+
+Tutorials for adapting Trinity-RFT to a general multi-step agentic scenario:
+
++ [General Multi-Step tasks](./docs/sphinx_doc/source/tutorial/example_step_wise.md)
++ [ReAct agent tasks](./docs/sphinx_doc/source/tutorial/example_react.md)
 
 
 Tutorials for data-related functionalities:
diff --git a/README_zh.md b/README_zh.md
index 6d0f8df8f2..d2851f2d39 100644
--- a/README_zh.md
+++ b/README_zh.md
@@ -22,6 +22,7 @@
 
 ## 🚀 最新动态
 
+* [2025-08] Trinity-RFT 现在已经支持通用多轮工作流的训练了，请参考 [ALFWorld](./docs/sphinx_doc/source/tutorial/example_step_wise.md) 和 [ReAct](./docs/sphinx_doc/source/tutorial/example_react.md) 的例子！
 * [2025-07] 发布 Trinity-RFT v0.2.0 版本，新增了多项功能优化。
 * [2025-07] 更新了[技术报告](https://arxiv.org/abs/2505.17826) (arXiv v2)，增加了新功能、示例和实验。
 * [2025-06] 发布 Trinity-RFT v0.1.1 版本，修复了已知问题并提升系统稳定性。
@@ -334,6 +335,12 @@ trinity run --config examples/grpo_gsm8k/gsm8k.yaml
 + [多轮任务](./docs/sphinx_doc/source/tutorial/example_multi_turn.md)
 
 
+将 Trinity-RFT 适配到通用多轮智能体场景的教程：
+
++ [通用多轮任务](./docs/sphinx_doc/source/tutorial/example_step_wise.md)
++ [ReAct智能体任务](./docs/sphinx_doc/source/tutorial/example_react.md)
+
+
 数据相关功能的教程：
 
 + [高级数据处理及Human-in-the-loop](./docs/sphinx_doc/source/tutorial/example_data_functionalities.md)
diff --git a/docs/sphinx_doc/assets/alfworldv2_reward.png b/docs/sphinx_doc/assets/alfworldv2_reward.png
new file mode 100644
index 0000000000..9eca788f70
Binary files /dev/null and b/docs/sphinx_doc/assets/alfworldv2_reward.png differ
diff --git a/docs/sphinx_doc/source/index.rst b/docs/sphinx_doc/source/index.rst
index 062e9e9e7f..34f67a32c3 100644
--- a/docs/sphinx_doc/source/index.rst
+++ b/docs/sphinx_doc/source/index.rst
@@ -20,6 +20,8 @@ Welcome to Trinity-RFT's documentation!
    tutorial/example_reasoning_advanced.md
    tutorial/example_async_mode.md
    tutorial/example_multi_turn.md
+   tutorial/example_step_wise.md
+   tutorial/example_react.md
    tutorial/example_dpo.md
    tutorial/example_data_functionalities.md
 
diff --git a/docs/sphinx_doc/source/main.md b/docs/sphinx_doc/source/main.md
index dcfb3fde34..4424642322 100644
--- a/docs/sphinx_doc/source/main.md
+++ b/docs/sphinx_doc/source/main.md
@@ -8,6 +8,7 @@
 
 ## 🚀 News
 
+* [2025-08] We now support training on general multi-step workflows! Please check out examples for [ALFWorld](./docs/sphinx_doc/source/tutorial/example_step_wise.md) and [ReAct](./docs/sphinx_doc/source/tutorial/example_react.md).
 * [2025-07] Trinity-RFT v0.2.0 is released.
 * [2025-07] We update the [technical report](https://arxiv.org/abs/2505.17826) (arXiv v2) with new features, examples, and experiments.
 * [2025-06] Trinity-RFT v0.1.1 is released.
@@ -309,7 +310,12 @@ Tutorials for running different RFT modes:
 
 Tutorials for adapting Trinity-RFT to a new multi-turn agentic scenario:
 
-+ [Multi-turn tasks](/tutorial/example_multi_turn.md)
++ [Concatenated Multi-turn tasks](./docs/sphinx_doc/source/tutorial/example_multi_turn.md)
+
+Tutorials for adapting Trinity-RFT to a general multi-step agentic scenario:
+
++ [General Multi-Step tasks](./docs/sphinx_doc/source/tutorial/example_step_wise.md)
++ [ReAct agent tasks](./docs/sphinx_doc/source/tutorial/example_react.md)
 
 
 Tutorials for data-related functionalities:
diff --git a/docs/sphinx_doc/source/tutorial/example_multi_turn.md b/docs/sphinx_doc/source/tutorial/example_multi_turn.md
index 1212b9dcf4..6cc28690e6 100644
--- a/docs/sphinx_doc/source/tutorial/example_multi_turn.md
+++ b/docs/sphinx_doc/source/tutorial/example_multi_turn.md
@@ -1,4 +1,4 @@
-# Multi-Turn RFT
+# Concatenated Multi-Turn RFT
 
 In Trinity-RFT, we support Agentic RL with multiple rounds of interaction with environments.
 
diff --git a/docs/sphinx_doc/source/tutorial/example_react.md b/docs/sphinx_doc/source/tutorial/example_react.md
new file mode 100644
index 0000000000..2cec955c72
--- /dev/null
+++ b/docs/sphinx_doc/source/tutorial/example_react.md
@@ -0,0 +1,136 @@
+
+# Multi-Step ReAct
+
+This example serves as a demonstration for adapting the Trinity-RFT training workflow to your own agentic project, through our OpenAI-compatible `ModelWrapper` class.
+
+Here, we use the [AgentScope](https://github.com/modelscope/agentscope) framework as an example, but you can certainly use any other framework, as Trinity offers great flexibility. This example fine-tunes a model on the GSM8K math dataset by leveraging an agent that uses ReAct-style reasoning with native tool calls.
+
+## Key Features Demonstrated
+
+This example highlights several advanced capabilities of the Trinity-RFT framework:
+
+### Seamless Integration with External Agent Frameworks
+Trinity-RFT is designed to be highly modular. You can easily embed complex, pre-existing agent logic from external frameworks like AgentScope directly into a Trinity `Workflow`.
+
+- **No Need for Rewrites**: You don't have to re-implement the intricate logic of your agent (e.g., the ReAct loop, memory management, or tool invocation) within Trinity.
+- **Focus on High-Level Orchestration**: As shown in our `AgentScopeReactV2MathWorkflow`, the Trinity workflow simply initializes and calls the external agent's `reply` method. Trinity abstracts away the underlying complexity, allowing you to focus on the high-level task orchestration and reward design.
+
+### General Multi-Step Training
+Modern agentic tasks often involve multiple steps of reasoning, tool use, and observation. Trinity-RFT natively supports training across these Multi-Step interactions.
+
+- **Step-Wise Experience Generation**: Instead of only learning from the final answer, Trinity can treat each step within an agent's reasoning trajectory as a distinct learning opportunity.
+- **Credit Assignment**: The reward for solving a task is propagated back to all experiences within the successful trajectory, enabling the model to learn the entire reasoning chain, not just the final response. This is controlled by the `add_strategy` in the config.
+
+### Native Tool Calling Support
+Trinity-RFT's inference engine and training pipeline are built to support the native OpenAI `tool_calls` format.
+
+- **Direct Training on Tool Use**: The framework allows the model to be trained on deciding *when* to call a tool, *which* tool to call, and *what* arguments to use, all formatted in the standard `tool_calls` convention.
+- **Interoperability**: This native support ensures seamless integration with any service or environment that consumes the OpenAI API format, such as an `MCP_server` (Multi-Agent Collaboration Platform) or other tool-use evaluators.
+
+## How It Works
+
+Below we show you how to perform this step-by-step.
+
+### The Workflow (`workflow.py`)
+
+The core logic is encapsulated in the `AgentScopeReactV2MathWorkflow` class.
+
+1.  **Initialization (`__init__`)**:
+    - It first initializes the AgentScope environment and the desired agent (`ReActAgentV2`).
+    - The most critical integration step is injecting Trinity's model client into the AgentScope agent:
+      ```python
+      self.openai_client = model.get_openai_client()
+      # ...
+      self.agent.model.client = self.openai_client
+      ```
+      This ensures that all API calls made by the AgentScope agent are routed through Trinity's `ModelWrapper`, which records the entire conversation history.
+
+2.  **Execution (`run`)**:
+    - The `run` method is remarkably simple. It just passes the task description to the agent.
+      ```python
+      content = self.agent.reply(msg).content
+      ```
+    - After the agent completes its multi-step reasoning and produces a final answer, Trinity extracts all the intermediate turns from the model's history:
+      ```python
+      experiences = self.model.extract_experience_from_history(clear_history=True)
+      ```
+    - A reward is calculated based on the final answer and is applied to all `Experience` objects generated from the trajectory. These experiences are then sent to the buffer for training.
+
+### Configuration
+
+The configuration file fine-tunes the behavior of the entire system. Here are the key parameters for this example:
+
+#### Native Tool Calling Settings
+
+These settings in the `explorer.rollout_model` section configure the VLLM-based engine to generate and parse OpenAI-compatible tool calls.
+We use the `Qwen3` model and host model with vllm. The configuration for different model can be found in [VLLM Toolcalls](https://docs.vllm.ai/en/stable/features/tool_calling.html#qwen-models)
+
+
+```yaml
+explorer:
+  rollout_model:
+    # ...
+    enable_auto_tool_choice: true # Enables the model to generate `tool_calls`
+    tool_call_parser: hermes       # Specifies the parser for formatting tool call outputs
+    reasoning_parser: deepseek_r1  # Helps in parsing the model's thought process
+    enable_thinking: true          # Enables the model to generate intermediate "thoughts"
+```
+
+#### Multi-Step Training Strategy
+
+This setting in the `algorithm` section defines how experiences from a Multi-Step rollout are processed.
+
+```yaml
+algorithm:
+  algorithm_type: grpo
+  add_strategy: step_wise_grpo # Key for Multi-Step training
+```
+-   `step_wise_grpo`: This strategy tells Trinity to create a distinct training sample for each step in the agent's execution path. The `grpo` algorithm then uses these samples to update the model.
+
+#### Asynchronous Synchronization for Efficiency
+
+Because Multi-Step rollouts produce a variable number of experiences, waiting for a fixed number of *rollouts* is inefficient. We use a dynamic synchronization strategy.
+
+```yaml
+synchronizer:
+  sync_style: dynamic_by_explorer # Start training when enough experiences are ready
+  sync_interval: 2
+```
+-   `sync_style: dynamic_by_explorer`: The trainer starts a training job as soon as the buffer has collected enough *experiences* (i.e., individual turns), rather than waiting for a fixed number of full agent trajectories. This significantly improves GPU utilization and training throughput.
+
+## How to Run the Example
+
+1.  **Prerequisites**: Ensure you have Trinity installed, along with the dependencies for this example (e.g., `agentscope`). Please refer to [Agentscope Github link](https://github.com/modelscope/agentscope).
+
+2. Download the model you want to use, and fill in the configuration files in `examples/agentscope_tool_react/agentscope_tool_react_gsm8k.yaml` or `examples/agentscope_tool_react/agentscope_tool_react_dapo.yaml`
+
+3.  **Launch the training job**: Run the following command from the root directory of the repository.
+
+    ```bash
+    trinity run --config examples/agentscope_tool_react/agentscope_tool_react_gsm8k.yaml
+    ```
+
+    or
+
+    ```bash
+    trinity run --config examples/agentscope_tool_react/agentscope_tool_react_dapo.yaml
+    ```
+
+
+The example here for gsm8k dataset is really simple and it can converge in a few minutes on 8 H20 GPUs.
+
+![](../../assets/agentscope_gsm8k_reward.png)
+
+The example here for dapo dataset take a little bit longer, but it also converges.
+
+![](../../assets/agentscope_dapo_reward.png)
+
+We can also see that the model generally start to use more tool calls to solve the problems.
+
+![](../../assets/agentscope_dapo_turns.png)
+
+
+
+## Summary
+
+This example is simple but demonstrates the power and flexibility of Trinity for training complex, Multi-Step agents that use tools. By seamlessly integrating external agentic logic and providing native support for Multi-Step training and tool calls, Trinity-RFT empowers you to fine-tune models on sophisticated, realistic tasks with high efficiency.
diff --git a/docs/sphinx_doc/source/tutorial/example_step_wise.md b/docs/sphinx_doc/source/tutorial/example_step_wise.md
new file mode 100644
index 0000000000..f2b772fde7
--- /dev/null
+++ b/docs/sphinx_doc/source/tutorial/example_step_wise.md
@@ -0,0 +1,201 @@
+# General Multi-Step RFT
+
+In Trinity-RFT, we support general multi-step RFTs, which can be used to train agents by interacting with environments in multiple rounds.
+
+Different from the [multi-turn RFT](./example_multi_turn.md) that concatenates the interaction results into one single `Experience`, this approach treats each step as an individual `Experience`, enabling RL agents to handle longer contexts.
+
+We will now illustrate the general multi-step workflow using ALFWorld. For a hands-on look, you can skip directly to the [code implementation](#example-multi-step-alfworld).
+
+## Build a general step-wise workflow
+
+### Basic concept
+
+In Trinity, we provide two types of general step-wise workflows: `StepWiseRewardWorkflow` and `RewardPropagationWorkflow`. These workflows setup the basic structure of a step-wise workflow and returns the a list of `experiences` in each run. Their difference is `StepWiseRewardWorkflow` computes the reward for each step and `RewardPropagationWorkflow` computes the reward after all steps and propagates the reward to the previous steps. See `trinity/common/workflows/step_wise_workflow.py` for more details.
+
+To build a new workflow, you mainly need to identify each interaction step in `step()` and the reward function in `reward()`. For example, the core code of ALFWorld workflow is shown as follows:
+
+
+```python
+class StepWiseAlfworldWorkflow(RewardPropagationWorkflow):
+    ...
+
+    def step(self, step_num: int) -> bool:
+        if self.done:
+            return False
+
+        # Format observation for the model
+        format_obs = format_observation(self.observation)  # type: ignore
+        self.memory.append({"role": "user", "content": format_obs})
+
+        # Get action from the model
+        responses = self.model.chat(self.memory)
+        response_text = responses[0].response_text
+        self.memory.append({"role": "assistant", "content": response_text})
+        action = parse_action(response_text)
+
+        # Execute action in the environment
+        observation, reward, done, info = self.env.step(action)
+
+        # Update internal state
+        self.observation = observation
+        self.done = done
+        if self.done:
+            self.final_reward = reward
+
+        # Return False to stop the run if the episode is done
+        return not self.done
+
+    def reward(self, exps: list[Experience]) -> float:
+        return self.final_reward
+```
+
+Also, remember to register your workflow:
+```python
+@WORKFLOWS.register_module("step_wise_alfworld_workflow")
+class StepWiseAlfworldWorkflow(RewardPropagationWorkflow):
+    """A step-wise workflow for alfworld task."""
+    ...
+```
+
+and include it in the init file `trinity/common/workflows/__init__.py`
+
+```diff
+ # -*- coding: utf-8 -*-
+ """Workflow module"""
+ from .workflow import WORKFLOWS, MathWorkflow, SimpleWorkflow
++from .envs.alfworld.alfworld_workflow import StepWiseAlfworldWorkflow
+
+ __all__ = [
+     "WORKFLOWS",
+     "SimpleWorkflow",
+     "MathWorkflow",
++    "StepWiseAlfworldWorkflow",
+ ]
+```
+
+### Other Configuration
+
+In general multi-step scenarios, each run may generate various number of experiences. To accomodate this case, we provide some flexible designs.
+
+- `algorithm.add_strategy = step_wise_grpo`: This function allows you compute the advantages for the collected experience before adding to the buffer. For this example, we use `step_wise_grpo` which broadcasts advantages from the last step to previous steps.
+
+- `buffer.train_batch_size`: The number of experiences to be sampled from the buffer for training, which can be different from the number of generated experiences in each explore step.
+
+- `buffer.trainer_input.use_priority_queue = true`: Using `PriorityQueue` allows the model to use the experiences with higher priority.
+
+- `synchronizer.sync_style = dynamic_by_explorer`: The explorer determines when to synchronize the model weights with the trainer.
+
+
+The example configuration is shown as:
+
+```yaml
+project: "ALFWORLD"
+name: "Step_Wise_Alfworld"
+checkpoint_root_dir: /PATH/TO/CHECKPOINT/ALFWORLD_RFT/
+algorithm:
+  algorithm_type: grpo
+  repeat_times: 16
+  add_strategy: step_wise_grpo
+model:
+  model_path: /PATH/TO/MODEL/
+  max_response_tokens: 16384
+  max_model_len: 20480
+cluster:
+  node_num: 1
+  gpu_per_node: 8
+buffer:
+  total_epochs: 20
+  batch_size: 16
+  train_batch_size: 7680  # here: batch_size * repeat_times * max_env_steps
+  max_retry_times: 3
+  max_retry_interval: 1
+  explorer_input:
+    taskset:
+      name: alfworld
+      storage_type: file
+      path: 'examples/grpo_alfworld/alfworld_data' # PATH TO ALFWORLD DATA
+      format:
+        prompt_key: 'game_file'
+      rollout_args:
+        temperature: 1.0
+        logprobs: 0
+      workflow_args:
+        max_env_steps: 30
+      enable_progress_bar: false
+    default_workflow_type: 'step_wise_alfworld_workflow'
+  trainer_input:
+    experience_buffer:
+      name: alfworld_buffer
+      storage_type: queue
+      use_priority_queue: true
+explorer:
+  max_repeat_times_per_runner: 1
+  runner_num: 32
+  max_timeout: 3600
+  rollout_model:
+    enable_history: true
+    engine_num: 2
+    tensor_parallel_size: 2
+    enable_prefix_caching: false
+    enforce_eager: true
+    dtype: bfloat16
+    seed: 42
+    gpu_memory_utilization: 0.7
+    enable_chunked_prefill: true
+  env_vars:
+    TMPDIR: /PATH/TO/ALFWORLD_TMP_DIR
+synchronizer:
+  sync_style: dynamic_by_explorer
+  sync_method: 'nccl'
+  sync_interval: 2
+  sync_timeout: 3600
+trainer:
+  trainer_type: 'verl'
+  trainer_config_path: 'examples/grpo_alfworld_general_multi_step/train_alfworld.yaml'
+  save_interval: 50
+```
+
+
+
+Below, we provide the commands for running the ALFWorld task.
+
+## Example: Multi-Step ALFWorld
+### Environment Preparation
+To install the ALFworld environment, you can follow the instructions below.
+
+1. Pip install: `pip install alfworld[full]`
+
+2. Export the path: `export ALFWORLD_DATA=/path/to/alfworld/data`
+
+3. Download the environment: `alfworld-download`
+
+Now you can find the environment in `$ALFWORLD_DATA` and continue with the following steps.
+
+You may refer to the original [repository](https://github.com/alfworld/alfworld) for more details.
+
+### Data Preparation
+Our dataset follows the format in Huggingface datasets library, so we should correspondingly convert our env dataset.
+
+Just check the data preparation scripts and run the following command.
+```bash
+python examples/grpo_alfworld/get_alfworld_data.py
+```
+
+The task is described as an environment instead of a single prompt. The task description is the `game_file` file path.
+
+
+### Config preparation and run the experiment
+
+The default config file is [`alfworld.yaml`](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_alfworld_general_multi_step/alfworld.yaml).
+You may revise the configurations properly and run the experiment!
+
+```bash
+trinity run --config examples/grpo_alfworld_general_multi_step/alfworld.yaml
+```
+
+The results are shown in the following figure.
+
+![](../../assets/alfworldv2_reward.png)
+
+
+Note that we use a Qwen2.5-3B model fine-tuned with SFT as our starting point, ensuring that the model has some basic understanding of the environment.
diff --git a/examples/agentscope_tool_react/README.md b/examples/agentscope_tool_react/README.md
index df08bf1ec7..9d1c77d441 100644
--- a/examples/agentscope_tool_react/README.md
+++ b/examples/agentscope_tool_react/README.md
@@ -1,137 +1,7 @@
+# ReAct on GSM8K and MATH Dataset
 
-# Training Using Complex Agent Workflows.
+This example shows how to train ReAct agent on GSM8K and MATH Dataset.
 
-This example serves as a demonstration for adapting the Trinity-RFT training workflow to your own agentic project, through our OpenAI-compatible `ModelWrapper` class.
+For more detailed information, please refer to the [documentation](../../docs/sphinx_doc/source/tutorial/example_react.md).
 
-Here, we use the [AgentScope](https://github.com/modelscope/agentscope) framework as an example, but you can certainly use any other framework, as Trinity offers great flexibility. This example fine-tunes a model on the GSM8K math dataset by leveraging an agent that uses ReAct-style reasoning with native tool calls.
-
-## Key Features Demonstrated
-
-This example highlights several advanced capabilities of the Trinity-RFT framework:
-
-### Seamless Integration with External Agent Frameworks
-Trinity-RFT is designed to be highly modular. You can easily embed complex, pre-existing agent logic from external frameworks like AgentScope directly into a Trinity `Workflow`.
-
-- **No Need for Rewrites**: You don't have to re-implement the intricate logic of your agent (e.g., the ReAct loop, memory management, or tool invocation) within Trinity.
-- **Focus on High-Level Orchestration**: As shown in our `AgentScopeReactV2MathWorkflow`, the Trinity workflow simply initializes and calls the external agent's `reply` method. Trinity abstracts away the underlying complexity, allowing you to focus on the high-level task orchestration and reward design.
-
-### General Multi-Turn Training
-Modern agentic tasks often involve multiple steps of reasoning, tool use, and observation. Trinity-RFT natively supports training across these multi-turn interactions.
-
-- **Step-Wise Experience Generation**: Instead of only learning from the final answer, Trinity can treat each step within an agent's reasoning trajectory as a distinct learning opportunity.
-- **Credit Assignment**: The reward for solving a task is propagated back to all experiences within the successful trajectory, enabling the model to learn the entire reasoning chain, not just the final response. This is controlled by the `add_strategy` in the config.
-
-### Native Tool Calling Support
-Trinity-RFT's inference engine and training pipeline are built to support the native OpenAI `tool_calls` format.
-
-- **Direct Training on Tool Use**: The framework allows the model to be trained on deciding *when* to call a tool, *which* tool to call, and *what* arguments to use, all formatted in the standard `tool_calls` convention.
-- **Interoperability**: This native support ensures seamless integration with any service or environment that consumes the OpenAI API format, such as an `MCP_server` (Multi-Agent Collaboration Platform) or other tool-use evaluators.
-
-## How It Works
-
-Below we show you how to perform this step-by-step.
-
-### The Workflow (`workflow.py`)
-
-The core logic is encapsulated in the `AgentScopeReactV2MathWorkflow` class.
-
-1.  **Initialization (`__init__`)**:
-    - It first initializes the AgentScope environment and the desired agent (`ReActAgentV2`).
-    - The most critical integration step is injecting Trinity's model client into the AgentScope agent:
-      ```python
-      self.openai_client = model.get_openai_client()
-      # ...
-      self.agent.model.client = self.openai_client
-      ```
-      This ensures that all API calls made by the AgentScope agent are routed through Trinity's `ModelWrapper`, which records the entire conversation history.
-
-2.  **Execution (`run`)**:
-    - The `run` method is remarkably simple. It just passes the task description to the agent.
-      ```python
-      content = self.agent.reply(msg).content
-      ```
-    - After the agent completes its multi-step reasoning and produces a final answer, Trinity extracts all the intermediate turns from the model's history:
-      ```python
-      experiences = self.model.extract_experience_from_history(clear_history=True)
-      ```
-    - A reward is calculated based on the final answer and is applied to all `Experience` objects generated from the trajectory. These experiences are then sent to the buffer for training.
-
-### The Configuration (`config.yaml`)
-
-The configuration file fine-tunes the behavior of the entire system. Here are the key parameters for this example:
-
-#### Native Tool Calling Settings
-
-These settings in the `explorer.rollout_model` section configure the VLLM-based engine to generate and parse OpenAI-compatible tool calls.
-We use the `Qwen3` model and host model with vllm. The configuration for different model can be found in [VLLM Toolcalls](https://docs.vllm.ai/en/stable/features/tool_calling.html#qwen-models)
-
-
-```yaml
-explorer:
-  rollout_model:
-    engine_type: vllm_async
-    # ...
-    enable_auto_tool_choice: true # Enables the model to generate `tool_calls`
-    tool_call_parser: hermes       # Specifies the parser for formatting tool call outputs
-    reasoning_parser: deepseek_r1  # Helps in parsing the model's thought process
-    enable_thinking: true          # Enables the model to generate intermediate "thoughts"
-```
-
-#### Multi-Turn Training Strategy
-
-This setting in the `algorithm` section defines how experiences from a multi-turn rollout are processed.
-
-```yaml
-algorithm:
-  algorithm_type: grpo
-  add_strategy: step_wise_grpo # Key for multi-turn training
-```
--   `step_wise_grpo`: This strategy tells Trinity to create a distinct training sample for each step in the agent's execution path. The `grpo` algorithm then uses these samples to update the model.
-
-#### Asynchronous Synchronization for Efficiency
-
-Because multi-turn rollouts produce a variable number of experiences, waiting for a fixed number of *rollouts* is inefficient. We use a dynamic synchronization strategy.
-
-```yaml
-synchronizer:
-  sync_style: dynamic_by_explorer # Start training when enough experiences are ready
-  sync_interval: 2
-```
--   `sync_style: dynamic_by_explorer`: The trainer starts a training job as soon as the buffer has collected enough *experiences* (i.e., individual turns), rather than waiting for a fixed number of full agent trajectories. This significantly improves GPU utilization and training throughput.
-
-## How to Run the Example
-
-1.  **Prerequisites**: Ensure you have Trinity installed, along with the dependencies for this example (e.g., `agentscope`). Please refer to [Agentscope Github link](https://github.com/modelscope/agentscope).
-
-2. Download the model you want to use, and fill in the configuration files in `examples/agentscope_tool_react/agentscope_tool_react_gsm8k.yaml` or `examples/agentscope_tool_react/agentscope_tool_react_dapo.yaml`
-
-3.  **Launch the training job**: Run the following command from the root directory of the repository.
-
-    ```bash
-    trinity run --config examples/agentscope_tool_react/agentscope_tool_react_gsm8k.yaml
-    ```
-
-    or
-
-    ```bash
-    trinity run --config examples/agentscope_tool_react/agentscope_tool_react_dapo.yaml
-    ```
-
-
-The example here for gsm8k dataset is really simple and it can converge in a few minutes on 8 H20 GPUs.
-
-![](../../docs/sphinx_doc/assets/agentscope_gsm8k_reward.png)
-
-The example here for dapo dataset take a little bit longer, but it also converges.
-
-![](../../docs/sphinx_doc/assets/agentscope_dapo_reward.png)
-
-We can also see that the model generally start to use more tool calls to solve the problems.
-
-![](../../docs/sphinx_doc/assets/agentscope_dapo_turns.png)
-
-
-
-## Summary
-
-This example is simple but demonstrates the power and flexibility of Trinity for training complex, multi-turn agents that use tools. By seamlessly integrating external agentic logic and providing native support for multi-turn training and tool calls, Trinity-RFT empowers you to fine-tune models on sophisticated, realistic tasks with high efficiency.
+The config files are located in [`alfworld.yaml`](alfworld.yaml) and [`train_alfworld.yaml`](train_alfworld.yaml).
diff --git a/examples/agentscope_tool_react/agentscope_tool_react_dapo.yaml b/examples/agentscope_tool_react/agentscope_tool_react_dapo.yaml
index d8ab2b3694..8e96958cde 100644
--- a/examples/agentscope_tool_react/agentscope_tool_react_dapo.yaml
+++ b/examples/agentscope_tool_react/agentscope_tool_react_dapo.yaml
@@ -5,7 +5,6 @@ algorithm:
   algorithm_type: grpo
   repeat_times: 8
   add_strategy: step_wise_grpo
-
 model:
   model_path: /PATH/TO/MODEL/Qwen3-8B
   max_response_tokens: 16384
@@ -16,6 +15,7 @@ cluster:
 buffer:
   total_epochs: 1
   batch_size: 32
+  train_batch_size: 512
   max_retry_times: 3
   max_retry_interval: 1
   explorer_input:
@@ -42,7 +42,6 @@ explorer:
   runner_num: 4
   max_timeout: 360
   rollout_model:
-    engine_type: vllm_async
     engine_num: 4
     tensor_parallel_size: 1
     enable_prefix_caching: false
diff --git a/examples/agentscope_tool_react/agentscope_tool_react_gsm8k.yaml b/examples/agentscope_tool_react/agentscope_tool_react_gsm8k.yaml
index 951de8c578..9a31f2953e 100644
--- a/examples/agentscope_tool_react/agentscope_tool_react_gsm8k.yaml
+++ b/examples/agentscope_tool_react/agentscope_tool_react_gsm8k.yaml
@@ -5,7 +5,6 @@ algorithm:
   algorithm_type: grpo
   repeat_times: 8
   add_strategy: step_wise_grpo
-
 model:
   model_path: /PATH/TO/MODEL/Qwen3-4B
   max_response_tokens: 16384
@@ -16,6 +15,7 @@ cluster:
 buffer:
   total_epochs: 1
   batch_size: 32
+  train_batch_size: 256
   max_retry_times: 3
   max_retry_interval: 1
   explorer_input:
@@ -42,7 +42,6 @@ explorer:
   runner_num: 4
   max_timeout: 360
   rollout_model:
-    engine_type: vllm_async
     engine_num: 4
     tensor_parallel_size: 1
     enable_prefix_caching: false
diff --git a/examples/grpo_alfworld_general_multi_step/README.md b/examples/grpo_alfworld_general_multi_step/README.md
new file mode 100644
index 0000000000..d7f18e6238
--- /dev/null
+++ b/examples/grpo_alfworld_general_multi_step/README.md
@@ -0,0 +1,13 @@
+# ALFWorld with general multi-step workflow
+
+This example shows an updated implementation for training ALFWorld, now built with a general, multi-step workflow.
+Please refer to the [documentation](../../docs/sphinx_doc/source/tutorial/example_step_wise.md) for more details.
+
+The config files are located in [`alfworld.yaml`](alfworld.yaml) and [`train_alfworld.yaml`](train_alfworld.yaml).
+
+
+The training performance of this example is shown as follows:
+
+<div style="display: flex; justify-content: space-around; align-items: center;">
+  <img src="../../docs/sphinx_doc/assets/alfworldv2_reward.png" alt="Reward Curve" style="width: 60%;">
+</div>
diff --git a/examples/grpo_alfworld_general_multi_step/alfworld.yaml b/examples/grpo_alfworld_general_multi_step/alfworld.yaml
new file mode 100644
index 0000000000..5685c574ba
--- /dev/null
+++ b/examples/grpo_alfworld_general_multi_step/alfworld.yaml
@@ -0,0 +1,66 @@
+project: "ALFWORLD"
+name: "Step_Wise_Alfworld"
+checkpoint_root_dir: /PATH/TO/CHECKPOINT/ALFWORLD_RFT/
+algorithm:
+  algorithm_type: grpo
+  repeat_times: 16
+  add_strategy: step_wise_grpo
+model:
+  model_path: /PATH/TO/MODEL/
+  max_response_tokens: 16384
+  max_model_len: 20480
+cluster:
+  node_num: 1
+  gpu_per_node: 8
+buffer:
+  total_epochs: 20
+  batch_size: 16
+  train_batch_size: 7680 # 16 * 16 * 30
+  max_retry_times: 3
+  max_retry_interval: 1
+  explorer_input:
+    taskset:
+      name: alfworld
+      storage_type: file
+      path: 'examples/grpo_alfworld/alfworld_data' # PATH TO ALFWORLD DATA
+      format:
+        prompt_key: 'game_file'
+      rollout_args:
+        temperature: 1.0
+        logprobs: 0
+      workflow_args:
+        max_env_steps: 30
+      enable_progress_bar: false
+    default_workflow_type: 'step_wise_alfworld_workflow'
+  trainer_input:
+    experience_buffer:
+      name: alfworld_buffer
+      storage_type: queue
+      use_priority_queue: true
+explorer:
+  max_repeat_times_per_runner: 1
+  runner_num: 32
+  max_timeout: 3600
+  rollout_model:
+    enable_history: true
+    engine_num: 2
+    tensor_parallel_size: 2
+    enable_prefix_caching: false
+    enforce_eager: true
+    dtype: bfloat16
+    seed: 42
+    gpu_memory_utilization: 0.7
+    enable_chunked_prefill: true
+  env_vars:
+    TMPDIR: /PATH/TO/ALFWORLD_TMP_DIR
+synchronizer:
+  sync_style: dynamic_by_explorer
+  sync_method: 'nccl'
+  sync_interval: 2
+  sync_timeout: 3600
+trainer:
+  trainer_type: 'verl'
+  trainer_config_path: 'examples/grpo_alfworld_general_multi_step/train_alfworld.yaml'
+  save_interval: 50
+monitor:
+  monitor_type: 'wandb'
diff --git a/examples/grpo_alfworld_general_multi_step/train_alfworld.yaml b/examples/grpo_alfworld_general_multi_step/train_alfworld.yaml
new file mode 100644
index 0000000000..a59982f49f
--- /dev/null
+++ b/examples/grpo_alfworld_general_multi_step/train_alfworld.yaml
@@ -0,0 +1,49 @@
+actor_rollout_ref:
+  hybrid_engine: True
+  model:
+    external_lib: null
+    override_config: { }
+    enable_gradient_checkpointing: True
+    use_remove_padding: False
+  actor:
+    strategy: fsdp  # This is for backward-compatibility
+    ppo_micro_batch_size_per_gpu: 1
+    use_dynamic_bsz: False
+    ppo_max_token_len_per_gpu: 16384 # n * ${data.max_prompt_length} + ${data.max_response_length}
+    grad_clip: 1.0
+    ppo_epochs: 1
+    shuffle: False
+    ulysses_sequence_parallel_size: 1 # sp size
+    optim:
+      lr: 5e-6
+      lr_warmup_steps_ratio: 0.  # the total steps will be injected during runtime
+      # min_lr_ratio: null   # only useful for warmup with cosine
+      warmup_style: constant  # select from constant/cosine
+      total_training_steps: -1  # must be override by program
+    fsdp_config:
+      wrap_policy:
+        # transformer_layer_cls_to_wrap: None
+        min_num_params: 0
+      param_offload: False
+      optimizer_offload: False
+      fsdp_size: -1
+  ref:
+    fsdp_config:
+      param_offload: False
+      wrap_policy:
+        # transformer_layer_cls_to_wrap: None
+        min_num_params: 0
+    log_prob_micro_batch_size_per_gpu: 1
+    log_prob_use_dynamic_bsz: ${actor_rollout_ref.actor.use_dynamic_bsz}
+    log_prob_max_token_len_per_gpu: ${actor_rollout_ref.actor.ppo_max_token_len_per_gpu}
+    ulysses_sequence_parallel_size: ${actor_rollout_ref.actor.ulysses_sequence_parallel_size} # sp size
+
+trainer:
+  balance_batch: True
+  # total_training_steps: null
+  # auto: find the last ckpt to resume. If can't find, start from scratch
+  resume_mode: auto # or auto or resume_path if
+  default_hdfs_dir: null
+  remove_previous_ckpt_in_save: False
+  del_local_ckpt_after_load: False
+  val_before_train: False
diff --git a/trinity/common/verl_config.py b/trinity/common/verl_config.py
index e203378987..4041ec1a67 100644
--- a/trinity/common/verl_config.py
+++ b/trinity/common/verl_config.py
@@ -13,7 +13,7 @@
 
 @dataclass
 class Data:
-    train_batch_size: int = 1024  # kept for RayPPOTrainer._validate_config
+    train_batch_size: int = 1024  # kept to pass RayPPOTrainer._validate_config
 
 
 @dataclass
@@ -315,6 +315,9 @@ def synchronize_config(self, config: Config) -> None:  # noqa: C901
             self.trainer.resume_mode = "auto"
 
         self.buffer = config.buffer
+        self.data.train_batch_size = (
+            config.buffer.train_batch_size
+        )  # kept to pass RayPPOTrainer._validate_config
 
         self.synchronizer = config.synchronizer
         self.actor_rollout_ref.synchronizer = config.synchronizer
diff --git a/trinity/common/workflows/__init__.py b/trinity/common/workflows/__init__.py
index d5321976c9..ebafdb066c 100644
--- a/trinity/common/workflows/__init__.py
+++ b/trinity/common/workflows/__init__.py
@@ -3,7 +3,7 @@
 from .customized_math_workflows import MathBoxedWorkflow
 from .customized_toolcall_workflows import ToolCallWorkflow
 from .envs.agentscope.agentscope_react_workflow import AgentScopeReactV2MathWorkflow
-from .envs.alfworld.alfworld_workflow import AlfworldWorkflow
+from .envs.alfworld.alfworld_workflow import AlfworldWorkflow, StepWiseAlfworldWorkflow
 from .envs.sciworld.sciworld_workflow import SciWorldWorkflow
 from .envs.webshop.webshop_workflow import WebShopWorkflow
 from .eval_workflow import MathEvalWorkflow
@@ -18,6 +18,7 @@
     "MathWorkflow",
     "WebShopWorkflow",
     "AlfworldWorkflow",
+    "StepWiseAlfworldWorkflow",
     "SciWorldWorkflow",
     "MathBoxedWorkflow",
     "MathRMWorkflow",
diff --git a/trinity/common/workflows/envs/alfworld/alfworld_workflow.py b/trinity/common/workflows/envs/alfworld/alfworld_workflow.py
index be258484e1..093d3f27cf 100644
--- a/trinity/common/workflows/envs/alfworld/alfworld_workflow.py
+++ b/trinity/common/workflows/envs/alfworld/alfworld_workflow.py
@@ -3,6 +3,7 @@
 
 from trinity.common.experience import Experience
 from trinity.common.models.model import ModelWrapper
+from trinity.common.workflows.step_wise_workflow import RewardPropagationWorkflow
 from trinity.common.workflows.workflow import WORKFLOWS, MultiTurnWorkflow, Task
 
 EXAMPLE_PROMPT = """
@@ -108,7 +109,7 @@ def __init__(
         )
         self.task_desc = task.task_desc or "0"
         self.repeat_times = task.repeat_times
-        self.max_env_steps = 30
+        self.max_env_steps = task.workflow_args.get("max_env_steps", 30)
 
     def get_model_response(self, messages):
         responses = self.model.chat(messages, n=1)
@@ -177,3 +178,118 @@ def create_environment(game_file):
             raise ImportError(error_message)
         env = create_environment(game_file_path)
         return self.generate_env_inference_samples(env, rollout_n)
+
+
+@WORKFLOWS.register_module("step_wise_alfworld_workflow")
+class StepWiseAlfworldWorkflow(RewardPropagationWorkflow):
+    """
+    An Alfworld workflow refactored to use the RewardPropagationWorkflow base class.
+
+    This workflow manages an Alfworld environment, interacts with it step-by-step
+    using a model, and calculates a final reward based on the episode's outcome.
+    """
+
+    def __init__(
+        self,
+        model: ModelWrapper,
+        task: Task,
+        auxiliary_models: Optional[List] = None,
+        use_openai_client: bool = False,
+    ):
+        super().__init__(
+            model=model,
+            task=task,
+            auxiliary_models=auxiliary_models,
+            use_openai_client=use_openai_client,
+        )
+        self.game_file_path = task.task_desc or "0"
+        self.max_env_steps = task.workflow_args.get("max_env_steps", 30)
+
+        self._setup_environment()
+
+        self.observation: Optional[str] = None
+        self.done: bool = False
+        self.final_reward: float = 0.0
+        self.memory: List[dict] = []
+
+    def _setup_environment(self):
+        """Initializes the Alfworld text-based environment."""
+        try:
+            import textworld
+            import textworld.gym
+            from alfworld.agents.environment.alfred_tw_env import (
+                AlfredDemangler,
+                AlfredExpert,
+                AlfredExpertType,
+            )
+
+            def create_environment(game_file):
+                expert = AlfredExpert(expert_type=AlfredExpertType.HANDCODED)
+                request_infos = textworld.EnvInfos(
+                    description=True, inventory=True, admissible_commands=True
+                )
+                env_id = textworld.gym.register_game(
+                    game_file, request_infos, wrappers=[AlfredDemangler(), expert]
+                )
+                env = textworld.gym.make(env_id)
+                return env
+
+            self.env = create_environment(self.game_file_path)
+
+        except ImportError as e:
+            error_message = (
+                f"Error importing Alfworld dependencies: {e}. Please ensure "
+                "Alfworld is installed correctly by following the instructions at "
+                "https://github.com/alfworld/alfworld"
+            )
+            raise ImportError(error_message)
+
+    def run(self) -> List[Experience]:
+        # Reset environment and state for a new episode
+        self.observation, info = self.env.reset()
+        self.done = False
+        self.final_reward = -0.1
+
+        self.memory.clear()
+        self.memory.append({"role": "system", "content": AlfWORLD_SYSTEM_PROMPT})
+
+        return super().run()
+
+    def step(self, step_num: int) -> bool:
+        if self.done:
+            return False
+
+        # Format observation for the model
+        format_obs = format_observation(self.observation)  # type: ignore
+        self.memory.append({"role": "user", "content": format_obs})
+
+        # Get action from the model
+        responses = self.model.chat(self.memory)
+        response_text = responses[0].response_text
+        self.memory.append({"role": "assistant", "content": response_text})
+        action = parse_action(response_text)
+
+        # Execute action in the environment
+        observation, reward, done, info = self.env.step(action)
+
+        # Update internal state
+        self.observation = observation
+        self.done = done
+        if self.done:
+            self.final_reward = reward
+
+        # Return False to stop the run if the episode is done
+        return not self.done
+
+    def reward(self, exps: list[Experience]) -> float:
+        return self.final_reward
+
+    @property
+    def max_step_num(self) -> int:
+        """Return the maximum number of steps allowed in an episode."""
+        return self.max_env_steps
+
+    def __del__(self):
+        """Ensures the environment is closed when the workflow object is destroyed."""
+        if hasattr(self, "env"):
+            self.env.close()
diff --git a/trinity/common/workflows/step_wise_workflow.py b/trinity/common/workflows/step_wise_workflow.py
index 2e3317efc8..20dd294a21 100644
--- a/trinity/common/workflows/step_wise_workflow.py
+++ b/trinity/common/workflows/step_wise_workflow.py
@@ -10,14 +10,19 @@
 class StepWiseRewardWorkflow(Workflow):
     """A workflow that implements step-wise rewards for tasks."""
 
-    def __init__(self, *, task: Task, model: ModelWrapper, auxiliary_models=None):
+    def __init__(
+        self, *, task: Task, model: ModelWrapper, auxiliary_models=None, use_openai_client=True
+    ):
         super().__init__(task=task, model=model, auxiliary_models=auxiliary_models)
         assert model.enable_history, (
             "Rollout Model must have history enabled for step-wise rewards, please "
             "set `explorer.rollout_model.enable_history` to `True` in your config."
         )
         # use the rollout model's OpenAI client to write your agent application
-        self.client: openai.OpenAI = model.get_openai_client()
+        if use_openai_client:
+            self.client: openai.OpenAI = model.get_openai_client()
+        else:
+            self.client = None
 
     def run(self) -> list[Experience]:
         """Run the workflow and return a list of experiences with step-wise rewards."""
@@ -74,14 +79,19 @@ def repeatable(self):
 class RewardPropagationWorkflow(Workflow):
     """A workflow that propagates rewards across multiple turns."""
 
-    def __init__(self, *, task: Task, model: ModelWrapper, auxiliary_models=None):
+    def __init__(
+        self, *, task: Task, model: ModelWrapper, auxiliary_models=None, use_openai_client=True
+    ):
         super().__init__(task=task, model=model, auxiliary_models=auxiliary_models)
         assert model.enable_history, (
             "Rollout Model must have history enabled for step-wise rewards, please "
             "set `explorer.rollout_model.enable_history` to `True` in your config."
         )
         # use the rollout model's OpenAI client to write your agent application
-        self.client: openai.OpenAI = model.get_openai_client()
+        if use_openai_client:
+            self.client: openai.OpenAI = model.get_openai_client()
+        else:
+            self.client = None
 
     def run(self) -> list[Experience]:
         """Run the workflow and return a list of experiences with step-wise rewards."""
@@ -101,6 +111,9 @@ def run(self) -> list[Experience]:
         reward = self.reward(experiences)
         for exp in experiences:
             exp.reward = reward
+            if exp.metrics is None:
+                exp.metrics = {}
+            exp.metrics["actual_env_steps"] = step + 1  # +1 because step starts from 0
         return experiences
 
     @abstractmethod
diff --git a/trinity/common/workflows/workflow.py b/trinity/common/workflows/workflow.py
index f1fcdf080f..20e03c9271 100644
--- a/trinity/common/workflows/workflow.py
+++ b/trinity/common/workflows/workflow.py
@@ -132,7 +132,7 @@ def run(self) -> List[Experience]:
 
 class MultiTurnWorkflow(Workflow):
     """
-    The base workflow class for multi-turn tasks.
+    The base workflow class for concatenated multi-turn tasks.
     """
 
     def __init__(