Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 11 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
</div>



<h2 align="center">Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models</h2>


Expand All @@ -20,22 +19,23 @@

## 💡 What is Trinity-RFT?

Trinity-RFT is a flexible, general-purpose framework for reinforcement fine-tuning (RFT) of large language models (LLMs). It provides three independent modules for users with different needs:
Trinity-RFT is a flexible, general-purpose framework for reinforcement fine-tuning (RFT) of large language models (LLMs). It decouples the RFT process into three key components: **Explorer**, **Trainer**, and **Buffer**, and provides functionalities for users with different backgrounds and objectives:


* 🤖 **Explorer**:For agent application developers. [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_workflow.html)
- Train an agent application to enhance its ability to complete tasks in a specified environment
* 🤖 For agent application developers. [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_workflow.html)
- Train agent applications to improve their ability to complete tasks in specific environments.
- Examples: [Multi-Turn Interaction](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_multi_turn.html), [ReAct Agent](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_react.html)

* 🧠 **Trainer**: For RL algorithm researchers. [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_algorithm.html)
- Design and validate new RL algorithms in compact, plug-and-play classes
- Examples: [Mixture of RL Algorithms](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html)
* 🧠 For RL algorithm researchers. [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_algorithm.html)
- Design and validate new reinforcement learning algorithms using compact, plug-and-play modules.
- Example: [Mixture of SFT and GRPO](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html)

* 🗄️ **Buffer**: For data engineers. [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_operator.html)
- Design task-specific datasets and build data pipelines for cleaning, augmentation, and human-in-the-loop scenarios
- Examples: [Data Functionalities](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_data_functionalities.html)
* 📊 For data engineers. [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_operator.html)
- Create task-specific datasets and build data pipelines for cleaning, augmentation, and human-in-the-loop scenarios.
- Example: [Data Processing](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_data_functionalities.html)


Trinity-RFT unifies the above three modules and provides the following key features:
## 🌟 Key Features

* **Flexible RFT Modes:**
- Supports synchronous/asynchronous, on-policy/off-policy, and online/offline training. Rollout and training can run separately and scale independently across devices.
Expand Down
18 changes: 8 additions & 10 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,21 +20,21 @@

## 💡 什么是 Trinity-RFT ?

Trinity-RFT 是一个灵活、通用的大语言模型(LLM)强化微调(RFT)框架。其提供三个独立模块,满足不同用户的需求
Trinity-RFT 是一个灵活、通用的大语言模型(LLM)强化微调(RFT)框架。 其将 RFT 流程解耦为三个关键模块:**Explorer**、**Trainer** 和 **Buffer**,并面向不同背景和目标的用户提供相应功能

* 🤖 **Explorer**:面向智能体应用开发者。[[教程]](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/develop_workflow.html)
* 🤖 面向智能体应用开发者。[[教程]](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/develop_workflow.html)
- 训练智能体应用,以增强其在指定环境中完成任务的能力
- 示例:[多轮交互](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_multi_turn.html),[ReAct 智能体](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_react.html)

* 🧠 **Trainer**:面向 RL 算法研究者。[[教程]](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/develop_algorithm.html)
* 🧠 面向 RL 算法研究者。[[教程]](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/develop_algorithm.html)
- 在简洁、可插拔的类中设计和验证新的 RL 算法
- 示例:[混合 RL 算法](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_mix_algo.html)
- 示例:[SFT/GRPO混合算法](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_mix_algo.html)

* 🗄️ **Buffer**:面向数据工程师。[[教程]](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/develop_operator.html)
* 📊 面向数据工程师。[[教程]](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/develop_operator.html)
- 设计任务定制数据集,构建数据流水线以支持清洗、增强和人类参与场景
- 示例:[数据功能](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_data_functionalities.html)
- 示例:[数据处理](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_data_functionalities.html)

Trinity-RFT 统一了上述三个模块,并提供以下核心特性:
# 🌟 核心特性

* **灵活的 RFT 模式:**
- 支持同步/异步、on-policy/off-policy 以及在线/离线训练。采样与训练可分离运行,并可在多设备上独立扩展。
Expand Down Expand Up @@ -186,9 +186,7 @@ docker run -it \
trinity-rft:latest
```

```{note}
如需使用 **Megatron-LM** 进行训练,请参考 {ref}`Megatron-LM Backend <Megatron-LM>`。
```
> 如需使用 **Megatron-LM** 进行训练,请参考 [Megatron-LM 支持](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_megatron.html)


### 第二步:准备数据集和模型
Expand Down
21 changes: 11 additions & 10 deletions docs/sphinx_doc/source/main.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,21 @@
## 💡 What is Trinity-RFT?

Trinity-RFT is a flexible, general-purpose framework for reinforcement fine-tuning (RFT) of large language models (LLMs). It provides three independent modules for users with different needs:
Trinity-RFT is a flexible, general-purpose framework for reinforcement fine-tuning (RFT) of large language models (LLMs). It decouples the RFT process into three key components: **Explorer**, **Trainer**, and **Buffer**, and provides functionalities for users with different backgrounds and objectives:

* 🤖 **Explorer**:For agent application developers. [[tutorial]](/tutorial/develop_workflow.md)
- Train an agent application to enhance its ability to complete tasks in a specified environment
* 🤖 For agent application developers. [[tutorial]](/tutorial/develop_workflow.md)
- Train agent applications to improve their ability to complete tasks in specific environments.
- Examples: [Multi-Turn Interaction](/tutorial/example_multi_turn.md), [ReAct Agent](/tutorial/example_react.md)

* 🧠 **Trainer**: For RL algorithm researchers. [[tutorial]](/tutorial/develop_algorithm.md)
- Design and validate new RL algorithms in compact, plug-and-play classes
- Examples: [Mixture of RL Algorithms](/tutorial/example_mix_algo.md)
* 🧠 For RL algorithm researchers. [[tutorial]](/tutorial/develop_algorithm.md)
- Design and validate new reinforcement learning algorithms using compact, plug-and-play modules.
- Example: [Mixture of SFT and GRPO](/tutorial/example_mix_algo.md)

* 🗃️ **Buffer**: For data engineers. [[tutorial]](/tutorial/develop_operator.md)
- Design task-specific datasets and build data pipelines for cleaning, augmentation, and human-in-the-loop scenarios
- Examples: [Data Functionalities](/tutorial/example_data_functionalities.md)
* 📊 For data engineers. [[tutorial]](/tutorial/develop_operator.md)
- Create task-specific datasets and build data pipelines for cleaning, augmentation, and human-in-the-loop scenarios.
- Example: [Data Processing](/tutorial/example_data_functionalities.md)

Trinity-RFT unifies the above three modules and provides the following key features:

## 🌟 Key Features

* **Flexible RFT Modes:**
- Supports synchronous/asynchronous, on-policy/off-policy, and online/offline training. Rollout and training can run separately and scale independently across devices.
Expand Down
27 changes: 10 additions & 17 deletions docs/sphinx_doc/source/tutorial/example_megatron.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,35 +4,29 @@
This guide walks you through how to train models using **Megatron-LM** in a clear way.

```{note}
This guide assumes you have already set up your environment following {ref}`Installation <Installation>`. If you haven't done so, please refer to that guide first.
This guide assumes you have already set up your environment from source code following {ref}`Installation <Installation>`. If you haven't done so, please refer to that guide first.
```

---

## Step 1: Installation


## Step 1: Install Megatron-LM Support

Install the project in editable mode with Megatron support:

```bash
# For bash users
pip install -e .[megatron]
pip install -e ".[megatron]"

# For zsh users (escape the brackets)
pip install -e .\[megatron\]
# for uv
# uv sync -extra megatron
```


#### Install Apex (from NVIDIA)

Finally, install NVIDIA's Apex library for mixed-precision training:
Then, install NVIDIA's Apex library for mixed-precision training:

```bash
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \
--config-settings "--build-option=--cpp_ext" \
--config-settings "--build-option=--cuda_ext" \
--resume-retries 999 git+https://github.com/NVIDIA/apex.git
--resume-retries 10 git+https://github.com/NVIDIA/apex.git
```

---
Expand All @@ -43,11 +37,10 @@ We provide a Docker setup to simplify environment management.

#### Build the Docker Image

```bash
git clone https://github.com/modelscope/Trinity-RFT
cd Trinity-RFT

# Build the image
Trinity-RFT provides a dedicated Dockerfile for Megatron-LM located at `scripts/docker_for_megatron/Dockerfile`. You can build the image using the following command:

```bash
docker build -f scripts/docker_for_megatron/Dockerfile -t trinity-rft-megatron:latest .
```

Expand Down
14 changes: 7 additions & 7 deletions docs/sphinx_doc/source_zh/main.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,20 @@
## 💡 什么是 Trinity-RFT?

Trinity-RFT 是一个灵活、通用的大语言模型(LLM)强化微调(RFT)框架。其提供三个独立模块,满足不同用户的需求
Trinity-RFT 是一个灵活、通用的大语言模型(LLM)强化微调(RFT)框架。 其将 RFT 流程解耦为三个关键模块:**Explorer**、**Trainer** 和 **Buffer**,并面向不同背景和目标的用户提供相应功能

* 🤖 **Explorer**:面向智能体应用开发者。[[教程]](/tutorial/develop_workflow.md)
* 🤖 面向智能体应用开发者。[[教程]](/tutorial/develop_workflow.md)
- 训练智能体应用,以增强其在指定环境中完成任务的能力
- 示例:[多轮交互](/tutorial/example_multi_turn.md),[ReAct 智能体](/tutorial/example_react.md)

* 🧠 **Trainer**:面向 RL 算法研究者。[[教程]](/tutorial/develop_algorithm.md)
* 🧠 面向 RL 算法研究者。[[教程]](/tutorial/develop_algorithm.md)
- 在简洁、可插拔的类中设计和验证新的 RL 算法
- 示例:[混合 RL 算法](/tutorial/example_mix_algo.md)
- 示例:[SFT/GRPO混合算法](/tutorial/example_mix_algo.md)

* 🗄️ **Buffer**:面向数据工程师。[[教程]](/tutorial/develop_operator.md)
* 📊 面向数据工程师。[[教程]](/tutorial/develop_operator.md)
- 设计任务定制数据集,构建数据流水线以支持清洗、增强和人类参与场景
- 示例:[数据功能](/tutorial/example_data_functionalities.md)
- 示例:[数据处理](/tutorial/example_data_functionalities.md)

Trinity-RFT 统一了上述三个模块,并提供以下核心特性:
# 🌟 核心特性

* **灵活的 RFT 模式:**
- 支持同步/异步、on-policy/off-policy 以及在线/离线训练。采样与训练可分离运行,并可在多设备上独立扩展。
Expand Down
84 changes: 17 additions & 67 deletions docs/sphinx_doc/source_zh/tutorial/example_megatron.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,84 +4,35 @@
本指南将清晰地引导你如何使用 **Megatron-LM** 来训练模型。

```{note}
本指南假设你已经按照 {ref}`安装指南 <Installation>` 设置好了环境。如果还没有,请先参考该指南。
本指南假设你已经按照 {ref}`安装指南 <Installation>` 中的源码安装方式配置好了环境。如果还没有,请先参考该指南。
```

---

## 步骤 1:安装
## 步骤 1:安装 Megatron-LM 支持

### 最低要求

在开始之前,请确保你的系统满足以下要求:

- **GPU**:至少 2 块 GPU(用于分布式训练)
- **CUDA**:版本 12.4 或更高
- **Python**:版本 3.10 或更高

---

### 安装依赖项

首先克隆仓库并创建虚拟环境:

```bash
# 克隆仓库
git clone https://github.com/modelscope/Trinity-RFT
cd Trinity-RFT
```

#### 选项 A:使用 Conda

```bash
# 创建并激活新环境
conda create -n trinity python=3.10
conda activate trinity
```

#### 选项 B:使用 venv

```bash
# 创建并激活虚拟环境
python3.10 -m venv .venv
source .venv/bin/activate
```

#### 安装包

以可编辑模式安装项目,并启用 Megatron 支持:

```bash
# 针对 bash 用户
pip install -e .[megatron]

# 针对 zsh 用户(需转义括号)
pip install -e .\[megatron\]
```

#### 安装 Flash Attention

安装基础依赖后,安装 `flash-attn`。编译过程可能需要几分钟,请耐心等待。
安装 Megatron-LM 相关依赖:

```bash
pip install flash-attn==2.8.1 -v
```
pip install -e ".[megatron]"

如果遇到安装问题,可尝试以下替代命令:

```bash
pip install flash-attn -v --no-build-isolation
# for uv
# uv sync -extra megatron
```

#### 安装 Apex(来自 NVIDIA)

最后,安装 NVIDIA 的 Apex 库以支持混合精度训练:
另外还需要从源码安装 NVIDIA 的 Apex 库以支持混合精度训练:

```bash
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \
--config-settings "--build-option=--cpp_ext" \
--config-settings "--build-option=--cuda_ext" \
--resume-retries 999 git+https://github.com/NVIDIA/apex.git
--resume-retries 10 git+https://github.com/NVIDIA/apex.git

# for uv
# uv pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \
# --config-settings "--build-option=--cpp_ext" \
# --config-settings "--build-option=--cuda_ext" \
# --resume-retries 10 git+https://github.com/NVIDIA/apex.git
```

---
Expand All @@ -92,11 +43,10 @@ pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \

#### 构建 Docker 镜像

```bash
git clone https://github.com/modelscope/Trinity-RFT
cd Trinity-RFT
Trinity-RFT 提供了专门用于 Megatron-LM 的 Dockerfile,位于 `scripts/docker_for_megatron/Dockerfile`。
可以使用以下命令构建镜像:

# 构建镜像
```bash
docker build -f scripts/docker_for_megatron/Dockerfile -t trinity-rft-megatron:latest .
```

Expand Down