agentscope-ai · pan-x-c · Sep 26, 2025 · Sep 26, 2025 · Sep 26, 2025 · Sep 26, 2025
diff --git a/README.md b/README.md
@@ -5,7 +5,6 @@
 </div>
 
 
-
 <h2 align="center">Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models</h2>
 
 
@@ -20,22 +19,23 @@
 
 ## 💡 What is Trinity-RFT?
 
-Trinity-RFT is a flexible, general-purpose framework for reinforcement fine-tuning (RFT) of large language models (LLMs). It provides three independent modules for users with different needs:
+Trinity-RFT is a flexible, general-purpose framework for reinforcement fine-tuning (RFT) of large language models (LLMs). It decouples the RFT process into three key components: **Explorer**, **Trainer**, and **Buffer**, and provides functionalities for users with different backgrounds and objectives:
+
 
-* 🤖 **Explorer**：For agent application developers. [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_workflow.html)
-  - Train an agent application to enhance its ability to complete tasks in a specified environment
+* 🤖 For agent application developers. [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_workflow.html)
+  - Train agent applications to improve their ability to complete tasks in specific environments.
   - Examples: [Multi-Turn Interaction](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_multi_turn.html), [ReAct Agent](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_react.html)
 
-* 🧠 **Trainer**: For RL algorithm researchers. [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_algorithm.html)
-  - Design and validate new RL algorithms in compact, plug-and-play classes
-  - Examples: [Mixture of RL Algorithms](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html)
+* 🧠 For RL algorithm researchers. [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_algorithm.html)
+  - Design and validate new reinforcement learning algorithms using compact, plug-and-play modules.
+  - Example: [Mixture of SFT and GRPO](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html)
 
-* 🗄️ **Buffer**: For data engineers. [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_operator.html)
-  - Design task-specific datasets and build data pipelines for cleaning, augmentation, and human-in-the-loop scenarios
-  - Examples: [Data Functionalities](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_data_functionalities.html)
+* 📊 For data engineers. [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_operator.html)
+  - Create task-specific datasets and build data pipelines for cleaning, augmentation, and human-in-the-loop scenarios.
+  - Example: [Data Processing](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_data_functionalities.html)
 
 
-Trinity-RFT unifies the above three modules and provides the following key features:
+## 🌟 Key Features
 
 * **Flexible RFT Modes:**
   - Supports synchronous/asynchronous, on-policy/off-policy, and online/offline training. Rollout and training can run separately and scale independently across devices.

diff --git a/README_zh.md b/README_zh.md
@@ -20,21 +20,21 @@
 
 ## 💡 什么是 Trinity-RFT ?
 
-Trinity-RFT 是一个灵活、通用的大语言模型（LLM）强化微调（RFT）框架。其提供三个独立模块，满足不同用户的需求：
+Trinity-RFT 是一个灵活、通用的大语言模型（LLM）强化微调（RFT）框架。 其将 RFT 流程解耦为三个关键模块：**Explorer**、**Trainer** 和 **Buffer**，并面向不同背景和目标的用户提供相应功能：
 
-* 🤖 **Explorer**：面向智能体应用开发者。[[教程]](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/develop_workflow.html)
+* 🤖 面向智能体应用开发者。[[教程]](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/develop_workflow.html)
   - 训练智能体应用，以增强其在指定环境中完成任务的能力
   - 示例：[多轮交互](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_multi_turn.html)，[ReAct 智能体](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_react.html)
 
-* 🧠 **Trainer**：面向 RL 算法研究者。[[教程]](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/develop_algorithm.html)
+* 🧠 面向 RL 算法研究者。[[教程]](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/develop_algorithm.html)
   - 在简洁、可插拔的类中设计和验证新的 RL 算法
-  - 示例：[混合 RL 算法](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_mix_algo.html)
+  - 示例：[SFT/GRPO混合算法](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_mix_algo.html)
 
-* 🗄️ **Buffer**：面向数据工程师。[[教程]](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/develop_operator.html)
+* 📊 面向数据工程师。[[教程]](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/develop_operator.html)
   - 设计任务定制数据集，构建数据流水线以支持清洗、增强和人类参与场景
-  - 示例：[数据功能](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_data_functionalities.html)
+  - 示例：[数据处理](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_data_functionalities.html)
 
-Trinity-RFT 统一了上述三个模块，并提供以下核心特性：
+# 🌟 核心特性
 
 * **灵活的 RFT 模式：**
   - 支持同步/异步、on-policy/off-policy 以及在线/离线训练。采样与训练可分离运行，并可在多设备上独立扩展。
@@ -186,9 +186,7 @@ docker run -it \
   trinity-rft:latest
 ```
 
-```{note}
-如需使用 **Megatron-LM** 进行训练，请参考 {ref}`Megatron-LM Backend <Megatron-LM>`。
-```
+> 如需使用 **Megatron-LM** 进行训练，请参考 [Megatron-LM 支持](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_megatron.html)
 
 
 ### 第二步：准备数据集和模型

diff --git a/docs/sphinx_doc/source/main.md b/docs/sphinx_doc/source/main.md
@@ -1,20 +1,21 @@
 ## 💡 What is Trinity-RFT?
 
-Trinity-RFT is a flexible, general-purpose framework for reinforcement fine-tuning (RFT) of large language models (LLMs). It provides three independent modules for users with different needs:
+Trinity-RFT is a flexible, general-purpose framework for reinforcement fine-tuning (RFT) of large language models (LLMs). It decouples the RFT process into three key components: **Explorer**, **Trainer**, and **Buffer**, and provides functionalities for users with different backgrounds and objectives:
 
-* 🤖 **Explorer**：For agent application developers. [[tutorial]](/tutorial/develop_workflow.md)
-  - Train an agent application to enhance its ability to complete tasks in a specified environment
+* 🤖 For agent application developers. [[tutorial]](/tutorial/develop_workflow.md)
+  - Train agent applications to improve their ability to complete tasks in specific environments.
   - Examples: [Multi-Turn Interaction](/tutorial/example_multi_turn.md), [ReAct Agent](/tutorial/example_react.md)
 
-* 🧠 **Trainer**: For RL algorithm researchers. [[tutorial]](/tutorial/develop_algorithm.md)
-  - Design and validate new RL algorithms in compact, plug-and-play classes
-  - Examples: [Mixture of RL Algorithms](/tutorial/example_mix_algo.md)
+* 🧠 For RL algorithm researchers. [[tutorial]](/tutorial/develop_algorithm.md)
+  - Design and validate new reinforcement learning algorithms using compact, plug-and-play modules.
+  - Example: [Mixture of SFT and GRPO](/tutorial/example_mix_algo.md)
 
-* 🗃️ **Buffer**: For data engineers. [[tutorial]](/tutorial/develop_operator.md)
-  - Design task-specific datasets and build data pipelines for cleaning, augmentation, and human-in-the-loop scenarios
-  - Examples: [Data Functionalities](/tutorial/example_data_functionalities.md)
+* 📊 For data engineers. [[tutorial]](/tutorial/develop_operator.md)
+  - Create task-specific datasets and build data pipelines for cleaning, augmentation, and human-in-the-loop scenarios.
+  - Example: [Data Processing](/tutorial/example_data_functionalities.md)
 
-Trinity-RFT unifies the above three modules and provides the following key features:
+
+## 🌟 Key Features
 
 * **Flexible RFT Modes:**
   - Supports synchronous/asynchronous, on-policy/off-policy, and online/offline training. Rollout and training can run separately and scale independently across devices.

diff --git a/docs/sphinx_doc/source/tutorial/example_megatron.md b/docs/sphinx_doc/source/tutorial/example_megatron.md
@@ -4,35 +4,29 @@
 This guide walks you through how to train models using **Megatron-LM** in a clear way.
 
 ```{note}
-This guide assumes you have already set up your environment following {ref}`Installation <Installation>`. If you haven't done so, please refer to that guide first.
+This guide assumes you have already set up your environment from source code following {ref}`Installation <Installation>`. If you haven't done so, please refer to that guide first.
 ```
 
 ---
 
-## Step 1: Installation
-
-
+## Step 1: Install Megatron-LM Support
 
 Install the project in editable mode with Megatron support:
 
 ```bash
-# For bash users
-pip install -e .[megatron]
+pip install -e ".[megatron]"
 
-# For zsh users (escape the brackets)
-pip install -e .\[megatron\]
+# for uv
+# uv sync -extra megatron
 ```
 
-
-#### Install Apex (from NVIDIA)
-
-Finally, install NVIDIA's Apex library for mixed-precision training:
+Then, install NVIDIA's Apex library for mixed-precision training:
 
 ```bash
 pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \
     --config-settings "--build-option=--cpp_ext" \
     --config-settings "--build-option=--cuda_ext" \
-    --resume-retries 999 git+https://github.com/NVIDIA/apex.git
+    --resume-retries 10 git+https://github.com/NVIDIA/apex.git
 ```
 
 ---
@@ -43,11 +37,10 @@ We provide a Docker setup to simplify environment management.
 
 #### Build the Docker Image
 
-```bash
-git clone https://github.com/modelscope/Trinity-RFT
-cd Trinity-RFT
 
-# Build the image
+Trinity-RFT provides a dedicated Dockerfile for Megatron-LM located at `scripts/docker_for_megatron/Dockerfile`. You can build the image using the following command:
+
+```bash
 docker build -f scripts/docker_for_megatron/Dockerfile -t trinity-rft-megatron:latest .
 ```
 

diff --git a/docs/sphinx_doc/source_zh/main.md b/docs/sphinx_doc/source_zh/main.md
@@ -1,20 +1,20 @@
 ## 💡 什么是 Trinity-RFT？
 
-Trinity-RFT 是一个灵活、通用的大语言模型（LLM）强化微调（RFT）框架。其提供三个独立模块，满足不同用户的需求：
+Trinity-RFT 是一个灵活、通用的大语言模型（LLM）强化微调（RFT）框架。 其将 RFT 流程解耦为三个关键模块：**Explorer**、**Trainer** 和 **Buffer**，并面向不同背景和目标的用户提供相应功能：
 
-* 🤖 **Explorer**：面向智能体应用开发者。[[教程]](/tutorial/develop_workflow.md)
+* 🤖 面向智能体应用开发者。[[教程]](/tutorial/develop_workflow.md)
   - 训练智能体应用，以增强其在指定环境中完成任务的能力
   - 示例：[多轮交互](/tutorial/example_multi_turn.md)，[ReAct 智能体](/tutorial/example_react.md)
 
-* 🧠 **Trainer**：面向 RL 算法研究者。[[教程]](/tutorial/develop_algorithm.md)
+* 🧠 面向 RL 算法研究者。[[教程]](/tutorial/develop_algorithm.md)
   - 在简洁、可插拔的类中设计和验证新的 RL 算法
-  - 示例：[混合 RL 算法](/tutorial/example_mix_algo.md)
+  - 示例：[SFT/GRPO混合算法](/tutorial/example_mix_algo.md)
 
-* 🗄️ **Buffer**：面向数据工程师。[[教程]](/tutorial/develop_operator.md)
+* 📊 面向数据工程师。[[教程]](/tutorial/develop_operator.md)
   - 设计任务定制数据集，构建数据流水线以支持清洗、增强和人类参与场景
-  - 示例：[数据功能](/tutorial/example_data_functionalities.md)
+  - 示例：[数据处理](/tutorial/example_data_functionalities.md)
 
-Trinity-RFT 统一了上述三个模块，并提供以下核心特性：
+# 🌟 核心特性
 
 * **灵活的 RFT 模式：**
   - 支持同步/异步、on-policy/off-policy 以及在线/离线训练。采样与训练可分离运行，并可在多设备上独立扩展。

diff --git a/docs/sphinx_doc/source_zh/tutorial/example_megatron.md b/docs/sphinx_doc/source_zh/tutorial/example_megatron.md
@@ -4,84 +4,35 @@
 本指南将清晰地引导你如何使用 **Megatron-LM** 来训练模型。
 
 ```{note}
-本指南假设你已经按照 {ref}`安装指南 <Installation>` 设置好了环境。如果还没有，请先参考该指南。
+本指南假设你已经按照 {ref}`安装指南 <Installation>` 中的源码安装方式配置好了环境。如果还没有，请先参考该指南。
 ```
 
 ---
 
-## 步骤 1：安装
+## 步骤 1：安装 Megatron-LM 支持
 
-### 最低要求
-
-在开始之前，请确保你的系统满足以下要求：
-
-- **GPU**：至少 2 块 GPU（用于分布式训练）
-- **CUDA**：版本 12.4 或更高
-- **Python**：版本 3.10 或更高
-
----
-
-### 安装依赖项
-
-首先克隆仓库并创建虚拟环境：
-
-```bash
-# 克隆仓库
-git clone https://github.com/modelscope/Trinity-RFT
-cd Trinity-RFT
-```
-
-#### 选项 A：使用 Conda
-
-```bash
-# 创建并激活新环境
-conda create -n trinity python=3.10
-conda activate trinity
-```
-
-#### 选项 B：使用 venv
-
-```bash
-# 创建并激活虚拟环境
-python3.10 -m venv .venv
-source .venv/bin/activate
-```
-
-#### 安装包
-
-以可编辑模式安装项目，并启用 Megatron 支持：
-
-```bash
-# 针对 bash 用户
-pip install -e .[megatron]
-
-# 针对 zsh 用户（需转义括号）
-pip install -e .\[megatron\]
-```
-
-#### 安装 Flash Attention
-
-安装基础依赖后，安装 `flash-attn`。编译过程可能需要几分钟，请耐心等待。
+安装 Megatron-LM 相关依赖：
 
 ```bash
-pip install flash-attn==2.8.1 -v
-```
+pip install -e ".[megatron]"
 
-如果遇到安装问题，可尝试以下替代命令：
-
-```bash
-pip install flash-attn -v --no-build-isolation
+# for uv
+# uv sync -extra megatron
 ```
 
-#### 安装 Apex（来自 NVIDIA）
-
-最后，安装 NVIDIA 的 Apex 库以支持混合精度训练：
+另外还需要从源码安装 NVIDIA 的 Apex 库以支持混合精度训练：
 
 ```bash
 pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \
     --config-settings "--build-option=--cpp_ext" \
     --config-settings "--build-option=--cuda_ext" \
-    --resume-retries 999 git+https://github.com/NVIDIA/apex.git
+    --resume-retries 10 git+https://github.com/NVIDIA/apex.git
+
+# for uv
+# uv pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \
+#    --config-settings "--build-option=--cpp_ext" \
+#    --config-settings "--build-option=--cuda_ext" \
+#    --resume-retries 10 git+https://github.com/NVIDIA/apex.git
 ```
 
 ---
@@ -92,11 +43,10 @@ pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \
 
 #### 构建 Docker 镜像
 
-```bash
-git clone https://github.com/modelscope/Trinity-RFT
-cd Trinity-RFT
+Trinity-RFT 提供了专门用于 Megatron-LM 的 Dockerfile，位于 `scripts/docker_for_megatron/Dockerfile`。
+可以使用以下命令构建镜像：
 
-# 构建镜像
+```bash
 docker build -f scripts/docker_for_megatron/Dockerfile -t trinity-rft-megatron:latest .
 ```