diff --git a/README.md b/README.md
index 83773a2977..b373b0c333 100644
--- a/README.md
+++ b/README.md
@@ -5,7 +5,6 @@
-
Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models
@@ -20,22 +19,23 @@
## 💡 What is Trinity-RFT?
-Trinity-RFT is a flexible, general-purpose framework for reinforcement fine-tuning (RFT) of large language models (LLMs). It provides three independent modules for users with different needs:
+Trinity-RFT is a flexible, general-purpose framework for reinforcement fine-tuning (RFT) of large language models (LLMs). It decouples the RFT process into three key components: **Explorer**, **Trainer**, and **Buffer**, and provides functionalities for users with different backgrounds and objectives:
+
-* 🤖 **Explorer**:For agent application developers. [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_workflow.html)
- - Train an agent application to enhance its ability to complete tasks in a specified environment
+* 🤖 For agent application developers. [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_workflow.html)
+ - Train agent applications to improve their ability to complete tasks in specific environments.
- Examples: [Multi-Turn Interaction](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_multi_turn.html), [ReAct Agent](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_react.html)
-* 🧠 **Trainer**: For RL algorithm researchers. [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_algorithm.html)
- - Design and validate new RL algorithms in compact, plug-and-play classes
- - Examples: [Mixture of RL Algorithms](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html)
+* 🧠 For RL algorithm researchers. [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_algorithm.html)
+ - Design and validate new reinforcement learning algorithms using compact, plug-and-play modules.
+ - Example: [Mixture of SFT and GRPO](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html)
-* 🗄️ **Buffer**: For data engineers. [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_operator.html)
- - Design task-specific datasets and build data pipelines for cleaning, augmentation, and human-in-the-loop scenarios
- - Examples: [Data Functionalities](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_data_functionalities.html)
+* 📊 For data engineers. [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_operator.html)
+ - Create task-specific datasets and build data pipelines for cleaning, augmentation, and human-in-the-loop scenarios.
+ - Example: [Data Processing](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_data_functionalities.html)
-Trinity-RFT unifies the above three modules and provides the following key features:
+## 🌟 Key Features
* **Flexible RFT Modes:**
- Supports synchronous/asynchronous, on-policy/off-policy, and online/offline training. Rollout and training can run separately and scale independently across devices.
diff --git a/README_zh.md b/README_zh.md
index 4db8563886..3248bb1451 100644
--- a/README_zh.md
+++ b/README_zh.md
@@ -20,21 +20,21 @@
## 💡 什么是 Trinity-RFT ?
-Trinity-RFT 是一个灵活、通用的大语言模型(LLM)强化微调(RFT)框架。其提供三个独立模块,满足不同用户的需求:
+Trinity-RFT 是一个灵活、通用的大语言模型(LLM)强化微调(RFT)框架。 其将 RFT 流程解耦为三个关键模块:**Explorer**、**Trainer** 和 **Buffer**,并面向不同背景和目标的用户提供相应功能:
-* 🤖 **Explorer**:面向智能体应用开发者。[[教程]](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/develop_workflow.html)
+* 🤖 面向智能体应用开发者。[[教程]](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/develop_workflow.html)
- 训练智能体应用,以增强其在指定环境中完成任务的能力
- 示例:[多轮交互](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_multi_turn.html),[ReAct 智能体](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_react.html)
-* 🧠 **Trainer**:面向 RL 算法研究者。[[教程]](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/develop_algorithm.html)
+* 🧠 面向 RL 算法研究者。[[教程]](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/develop_algorithm.html)
- 在简洁、可插拔的类中设计和验证新的 RL 算法
- - 示例:[混合 RL 算法](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_mix_algo.html)
+ - 示例:[SFT/GRPO混合算法](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_mix_algo.html)
-* 🗄️ **Buffer**:面向数据工程师。[[教程]](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/develop_operator.html)
+* 📊 面向数据工程师。[[教程]](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/develop_operator.html)
- 设计任务定制数据集,构建数据流水线以支持清洗、增强和人类参与场景
- - 示例:[数据功能](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_data_functionalities.html)
+ - 示例:[数据处理](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_data_functionalities.html)
-Trinity-RFT 统一了上述三个模块,并提供以下核心特性:
+# 🌟 核心特性
* **灵活的 RFT 模式:**
- 支持同步/异步、on-policy/off-policy 以及在线/离线训练。采样与训练可分离运行,并可在多设备上独立扩展。
@@ -186,9 +186,7 @@ docker run -it \
trinity-rft:latest
```
-```{note}
-如需使用 **Megatron-LM** 进行训练,请参考 {ref}`Megatron-LM Backend `。
-```
+> 如需使用 **Megatron-LM** 进行训练,请参考 [Megatron-LM 支持](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_megatron.html)
### 第二步:准备数据集和模型
diff --git a/docs/sphinx_doc/source/main.md b/docs/sphinx_doc/source/main.md
index b813b74d18..a79d3f4548 100644
--- a/docs/sphinx_doc/source/main.md
+++ b/docs/sphinx_doc/source/main.md
@@ -1,20 +1,21 @@
## 💡 What is Trinity-RFT?
-Trinity-RFT is a flexible, general-purpose framework for reinforcement fine-tuning (RFT) of large language models (LLMs). It provides three independent modules for users with different needs:
+Trinity-RFT is a flexible, general-purpose framework for reinforcement fine-tuning (RFT) of large language models (LLMs). It decouples the RFT process into three key components: **Explorer**, **Trainer**, and **Buffer**, and provides functionalities for users with different backgrounds and objectives:
-* 🤖 **Explorer**:For agent application developers. [[tutorial]](/tutorial/develop_workflow.md)
- - Train an agent application to enhance its ability to complete tasks in a specified environment
+* 🤖 For agent application developers. [[tutorial]](/tutorial/develop_workflow.md)
+ - Train agent applications to improve their ability to complete tasks in specific environments.
- Examples: [Multi-Turn Interaction](/tutorial/example_multi_turn.md), [ReAct Agent](/tutorial/example_react.md)
-* 🧠 **Trainer**: For RL algorithm researchers. [[tutorial]](/tutorial/develop_algorithm.md)
- - Design and validate new RL algorithms in compact, plug-and-play classes
- - Examples: [Mixture of RL Algorithms](/tutorial/example_mix_algo.md)
+* 🧠 For RL algorithm researchers. [[tutorial]](/tutorial/develop_algorithm.md)
+ - Design and validate new reinforcement learning algorithms using compact, plug-and-play modules.
+ - Example: [Mixture of SFT and GRPO](/tutorial/example_mix_algo.md)
-* 🗃️ **Buffer**: For data engineers. [[tutorial]](/tutorial/develop_operator.md)
- - Design task-specific datasets and build data pipelines for cleaning, augmentation, and human-in-the-loop scenarios
- - Examples: [Data Functionalities](/tutorial/example_data_functionalities.md)
+* 📊 For data engineers. [[tutorial]](/tutorial/develop_operator.md)
+ - Create task-specific datasets and build data pipelines for cleaning, augmentation, and human-in-the-loop scenarios.
+ - Example: [Data Processing](/tutorial/example_data_functionalities.md)
-Trinity-RFT unifies the above three modules and provides the following key features:
+
+## 🌟 Key Features
* **Flexible RFT Modes:**
- Supports synchronous/asynchronous, on-policy/off-policy, and online/offline training. Rollout and training can run separately and scale independently across devices.
diff --git a/docs/sphinx_doc/source/tutorial/example_megatron.md b/docs/sphinx_doc/source/tutorial/example_megatron.md
index fb5fd0a121..085dcf5d3d 100644
--- a/docs/sphinx_doc/source/tutorial/example_megatron.md
+++ b/docs/sphinx_doc/source/tutorial/example_megatron.md
@@ -4,35 +4,29 @@
This guide walks you through how to train models using **Megatron-LM** in a clear way.
```{note}
-This guide assumes you have already set up your environment following {ref}`Installation `. If you haven't done so, please refer to that guide first.
+This guide assumes you have already set up your environment from source code following {ref}`Installation `. If you haven't done so, please refer to that guide first.
```
---
-## Step 1: Installation
-
-
+## Step 1: Install Megatron-LM Support
Install the project in editable mode with Megatron support:
```bash
-# For bash users
-pip install -e .[megatron]
+pip install -e ".[megatron]"
-# For zsh users (escape the brackets)
-pip install -e .\[megatron\]
+# for uv
+# uv sync -extra megatron
```
-
-#### Install Apex (from NVIDIA)
-
-Finally, install NVIDIA's Apex library for mixed-precision training:
+Then, install NVIDIA's Apex library for mixed-precision training:
```bash
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \
--config-settings "--build-option=--cpp_ext" \
--config-settings "--build-option=--cuda_ext" \
- --resume-retries 999 git+https://github.com/NVIDIA/apex.git
+ --resume-retries 10 git+https://github.com/NVIDIA/apex.git
```
---
@@ -43,11 +37,10 @@ We provide a Docker setup to simplify environment management.
#### Build the Docker Image
-```bash
-git clone https://github.com/modelscope/Trinity-RFT
-cd Trinity-RFT
-# Build the image
+Trinity-RFT provides a dedicated Dockerfile for Megatron-LM located at `scripts/docker_for_megatron/Dockerfile`. You can build the image using the following command:
+
+```bash
docker build -f scripts/docker_for_megatron/Dockerfile -t trinity-rft-megatron:latest .
```
diff --git a/docs/sphinx_doc/source_zh/main.md b/docs/sphinx_doc/source_zh/main.md
index eb89d59d17..5acc3baf0e 100644
--- a/docs/sphinx_doc/source_zh/main.md
+++ b/docs/sphinx_doc/source_zh/main.md
@@ -1,20 +1,20 @@
## 💡 什么是 Trinity-RFT?
-Trinity-RFT 是一个灵活、通用的大语言模型(LLM)强化微调(RFT)框架。其提供三个独立模块,满足不同用户的需求:
+Trinity-RFT 是一个灵活、通用的大语言模型(LLM)强化微调(RFT)框架。 其将 RFT 流程解耦为三个关键模块:**Explorer**、**Trainer** 和 **Buffer**,并面向不同背景和目标的用户提供相应功能:
-* 🤖 **Explorer**:面向智能体应用开发者。[[教程]](/tutorial/develop_workflow.md)
+* 🤖 面向智能体应用开发者。[[教程]](/tutorial/develop_workflow.md)
- 训练智能体应用,以增强其在指定环境中完成任务的能力
- 示例:[多轮交互](/tutorial/example_multi_turn.md),[ReAct 智能体](/tutorial/example_react.md)
-* 🧠 **Trainer**:面向 RL 算法研究者。[[教程]](/tutorial/develop_algorithm.md)
+* 🧠 面向 RL 算法研究者。[[教程]](/tutorial/develop_algorithm.md)
- 在简洁、可插拔的类中设计和验证新的 RL 算法
- - 示例:[混合 RL 算法](/tutorial/example_mix_algo.md)
+ - 示例:[SFT/GRPO混合算法](/tutorial/example_mix_algo.md)
-* 🗄️ **Buffer**:面向数据工程师。[[教程]](/tutorial/develop_operator.md)
+* 📊 面向数据工程师。[[教程]](/tutorial/develop_operator.md)
- 设计任务定制数据集,构建数据流水线以支持清洗、增强和人类参与场景
- - 示例:[数据功能](/tutorial/example_data_functionalities.md)
+ - 示例:[数据处理](/tutorial/example_data_functionalities.md)
-Trinity-RFT 统一了上述三个模块,并提供以下核心特性:
+# 🌟 核心特性
* **灵活的 RFT 模式:**
- 支持同步/异步、on-policy/off-policy 以及在线/离线训练。采样与训练可分离运行,并可在多设备上独立扩展。
diff --git a/docs/sphinx_doc/source_zh/tutorial/example_megatron.md b/docs/sphinx_doc/source_zh/tutorial/example_megatron.md
index 1c34b27cb4..a31db2d9f0 100644
--- a/docs/sphinx_doc/source_zh/tutorial/example_megatron.md
+++ b/docs/sphinx_doc/source_zh/tutorial/example_megatron.md
@@ -4,84 +4,35 @@
本指南将清晰地引导你如何使用 **Megatron-LM** 来训练模型。
```{note}
-本指南假设你已经按照 {ref}`安装指南 ` 设置好了环境。如果还没有,请先参考该指南。
+本指南假设你已经按照 {ref}`安装指南 ` 中的源码安装方式配置好了环境。如果还没有,请先参考该指南。
```
---
-## 步骤 1:安装
+## 步骤 1:安装 Megatron-LM 支持
-### 最低要求
-
-在开始之前,请确保你的系统满足以下要求:
-
-- **GPU**:至少 2 块 GPU(用于分布式训练)
-- **CUDA**:版本 12.4 或更高
-- **Python**:版本 3.10 或更高
-
----
-
-### 安装依赖项
-
-首先克隆仓库并创建虚拟环境:
-
-```bash
-# 克隆仓库
-git clone https://github.com/modelscope/Trinity-RFT
-cd Trinity-RFT
-```
-
-#### 选项 A:使用 Conda
-
-```bash
-# 创建并激活新环境
-conda create -n trinity python=3.10
-conda activate trinity
-```
-
-#### 选项 B:使用 venv
-
-```bash
-# 创建并激活虚拟环境
-python3.10 -m venv .venv
-source .venv/bin/activate
-```
-
-#### 安装包
-
-以可编辑模式安装项目,并启用 Megatron 支持:
-
-```bash
-# 针对 bash 用户
-pip install -e .[megatron]
-
-# 针对 zsh 用户(需转义括号)
-pip install -e .\[megatron\]
-```
-
-#### 安装 Flash Attention
-
-安装基础依赖后,安装 `flash-attn`。编译过程可能需要几分钟,请耐心等待。
+安装 Megatron-LM 相关依赖:
```bash
-pip install flash-attn==2.8.1 -v
-```
+pip install -e ".[megatron]"
-如果遇到安装问题,可尝试以下替代命令:
-
-```bash
-pip install flash-attn -v --no-build-isolation
+# for uv
+# uv sync -extra megatron
```
-#### 安装 Apex(来自 NVIDIA)
-
-最后,安装 NVIDIA 的 Apex 库以支持混合精度训练:
+另外还需要从源码安装 NVIDIA 的 Apex 库以支持混合精度训练:
```bash
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \
--config-settings "--build-option=--cpp_ext" \
--config-settings "--build-option=--cuda_ext" \
- --resume-retries 999 git+https://github.com/NVIDIA/apex.git
+ --resume-retries 10 git+https://github.com/NVIDIA/apex.git
+
+# for uv
+# uv pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \
+# --config-settings "--build-option=--cpp_ext" \
+# --config-settings "--build-option=--cuda_ext" \
+# --resume-retries 10 git+https://github.com/NVIDIA/apex.git
```
---
@@ -92,11 +43,10 @@ pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \
#### 构建 Docker 镜像
-```bash
-git clone https://github.com/modelscope/Trinity-RFT
-cd Trinity-RFT
+Trinity-RFT 提供了专门用于 Megatron-LM 的 Dockerfile,位于 `scripts/docker_for_megatron/Dockerfile`。
+可以使用以下命令构建镜像:
-# 构建镜像
+```bash
docker build -f scripts/docker_for_megatron/Dockerfile -t trinity-rft-megatron:latest .
```