diff --git a/README.md b/README.md index 83773a2977..b373b0c333 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,6 @@ -

Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models

@@ -20,22 +19,23 @@ ## 💡 What is Trinity-RFT? -Trinity-RFT is a flexible, general-purpose framework for reinforcement fine-tuning (RFT) of large language models (LLMs). It provides three independent modules for users with different needs: +Trinity-RFT is a flexible, general-purpose framework for reinforcement fine-tuning (RFT) of large language models (LLMs). It decouples the RFT process into three key components: **Explorer**, **Trainer**, and **Buffer**, and provides functionalities for users with different backgrounds and objectives: + -* 🤖 **Explorer**:For agent application developers. [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_workflow.html) - - Train an agent application to enhance its ability to complete tasks in a specified environment +* 🤖 For agent application developers. [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_workflow.html) + - Train agent applications to improve their ability to complete tasks in specific environments. - Examples: [Multi-Turn Interaction](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_multi_turn.html), [ReAct Agent](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_react.html) -* 🧠 **Trainer**: For RL algorithm researchers. [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_algorithm.html) - - Design and validate new RL algorithms in compact, plug-and-play classes - - Examples: [Mixture of RL Algorithms](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html) +* 🧠 For RL algorithm researchers. [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_algorithm.html) + - Design and validate new reinforcement learning algorithms using compact, plug-and-play modules. + - Example: [Mixture of SFT and GRPO](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html) -* 🗄️ **Buffer**: For data engineers. [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_operator.html) - - Design task-specific datasets and build data pipelines for cleaning, augmentation, and human-in-the-loop scenarios - - Examples: [Data Functionalities](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_data_functionalities.html) +* 📊 For data engineers. [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_operator.html) + - Create task-specific datasets and build data pipelines for cleaning, augmentation, and human-in-the-loop scenarios. + - Example: [Data Processing](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_data_functionalities.html) -Trinity-RFT unifies the above three modules and provides the following key features: +## 🌟 Key Features * **Flexible RFT Modes:** - Supports synchronous/asynchronous, on-policy/off-policy, and online/offline training. Rollout and training can run separately and scale independently across devices. diff --git a/README_zh.md b/README_zh.md index 4db8563886..3248bb1451 100644 --- a/README_zh.md +++ b/README_zh.md @@ -20,21 +20,21 @@ ## 💡 什么是 Trinity-RFT ? -Trinity-RFT 是一个灵活、通用的大语言模型(LLM)强化微调(RFT)框架。其提供三个独立模块,满足不同用户的需求: +Trinity-RFT 是一个灵活、通用的大语言模型(LLM)强化微调(RFT)框架。 其将 RFT 流程解耦为三个关键模块:**Explorer**、**Trainer** 和 **Buffer**,并面向不同背景和目标的用户提供相应功能: -* 🤖 **Explorer**:面向智能体应用开发者。[[教程]](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/develop_workflow.html) +* 🤖 面向智能体应用开发者。[[教程]](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/develop_workflow.html) - 训练智能体应用,以增强其在指定环境中完成任务的能力 - 示例:[多轮交互](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_multi_turn.html),[ReAct 智能体](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_react.html) -* 🧠 **Trainer**:面向 RL 算法研究者。[[教程]](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/develop_algorithm.html) +* 🧠 面向 RL 算法研究者。[[教程]](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/develop_algorithm.html) - 在简洁、可插拔的类中设计和验证新的 RL 算法 - - 示例:[混合 RL 算法](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_mix_algo.html) + - 示例:[SFT/GRPO混合算法](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_mix_algo.html) -* 🗄️ **Buffer**:面向数据工程师。[[教程]](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/develop_operator.html) +* 📊 面向数据工程师。[[教程]](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/develop_operator.html) - 设计任务定制数据集,构建数据流水线以支持清洗、增强和人类参与场景 - - 示例:[数据功能](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_data_functionalities.html) + - 示例:[数据处理](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_data_functionalities.html) -Trinity-RFT 统一了上述三个模块,并提供以下核心特性: +# 🌟 核心特性 * **灵活的 RFT 模式:** - 支持同步/异步、on-policy/off-policy 以及在线/离线训练。采样与训练可分离运行,并可在多设备上独立扩展。 @@ -186,9 +186,7 @@ docker run -it \ trinity-rft:latest ``` -```{note} -如需使用 **Megatron-LM** 进行训练,请参考 {ref}`Megatron-LM Backend `。 -``` +> 如需使用 **Megatron-LM** 进行训练,请参考 [Megatron-LM 支持](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_megatron.html) ### 第二步:准备数据集和模型 diff --git a/docs/sphinx_doc/source/main.md b/docs/sphinx_doc/source/main.md index b813b74d18..a79d3f4548 100644 --- a/docs/sphinx_doc/source/main.md +++ b/docs/sphinx_doc/source/main.md @@ -1,20 +1,21 @@ ## 💡 What is Trinity-RFT? -Trinity-RFT is a flexible, general-purpose framework for reinforcement fine-tuning (RFT) of large language models (LLMs). It provides three independent modules for users with different needs: +Trinity-RFT is a flexible, general-purpose framework for reinforcement fine-tuning (RFT) of large language models (LLMs). It decouples the RFT process into three key components: **Explorer**, **Trainer**, and **Buffer**, and provides functionalities for users with different backgrounds and objectives: -* 🤖 **Explorer**:For agent application developers. [[tutorial]](/tutorial/develop_workflow.md) - - Train an agent application to enhance its ability to complete tasks in a specified environment +* 🤖 For agent application developers. [[tutorial]](/tutorial/develop_workflow.md) + - Train agent applications to improve their ability to complete tasks in specific environments. - Examples: [Multi-Turn Interaction](/tutorial/example_multi_turn.md), [ReAct Agent](/tutorial/example_react.md) -* 🧠 **Trainer**: For RL algorithm researchers. [[tutorial]](/tutorial/develop_algorithm.md) - - Design and validate new RL algorithms in compact, plug-and-play classes - - Examples: [Mixture of RL Algorithms](/tutorial/example_mix_algo.md) +* 🧠 For RL algorithm researchers. [[tutorial]](/tutorial/develop_algorithm.md) + - Design and validate new reinforcement learning algorithms using compact, plug-and-play modules. + - Example: [Mixture of SFT and GRPO](/tutorial/example_mix_algo.md) -* 🗃️ **Buffer**: For data engineers. [[tutorial]](/tutorial/develop_operator.md) - - Design task-specific datasets and build data pipelines for cleaning, augmentation, and human-in-the-loop scenarios - - Examples: [Data Functionalities](/tutorial/example_data_functionalities.md) +* 📊 For data engineers. [[tutorial]](/tutorial/develop_operator.md) + - Create task-specific datasets and build data pipelines for cleaning, augmentation, and human-in-the-loop scenarios. + - Example: [Data Processing](/tutorial/example_data_functionalities.md) -Trinity-RFT unifies the above three modules and provides the following key features: + +## 🌟 Key Features * **Flexible RFT Modes:** - Supports synchronous/asynchronous, on-policy/off-policy, and online/offline training. Rollout and training can run separately and scale independently across devices. diff --git a/docs/sphinx_doc/source/tutorial/example_megatron.md b/docs/sphinx_doc/source/tutorial/example_megatron.md index fb5fd0a121..085dcf5d3d 100644 --- a/docs/sphinx_doc/source/tutorial/example_megatron.md +++ b/docs/sphinx_doc/source/tutorial/example_megatron.md @@ -4,35 +4,29 @@ This guide walks you through how to train models using **Megatron-LM** in a clear way. ```{note} -This guide assumes you have already set up your environment following {ref}`Installation `. If you haven't done so, please refer to that guide first. +This guide assumes you have already set up your environment from source code following {ref}`Installation `. If you haven't done so, please refer to that guide first. ``` --- -## Step 1: Installation - - +## Step 1: Install Megatron-LM Support Install the project in editable mode with Megatron support: ```bash -# For bash users -pip install -e .[megatron] +pip install -e ".[megatron]" -# For zsh users (escape the brackets) -pip install -e .\[megatron\] +# for uv +# uv sync -extra megatron ``` - -#### Install Apex (from NVIDIA) - -Finally, install NVIDIA's Apex library for mixed-precision training: +Then, install NVIDIA's Apex library for mixed-precision training: ```bash pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \ --config-settings "--build-option=--cpp_ext" \ --config-settings "--build-option=--cuda_ext" \ - --resume-retries 999 git+https://github.com/NVIDIA/apex.git + --resume-retries 10 git+https://github.com/NVIDIA/apex.git ``` --- @@ -43,11 +37,10 @@ We provide a Docker setup to simplify environment management. #### Build the Docker Image -```bash -git clone https://github.com/modelscope/Trinity-RFT -cd Trinity-RFT -# Build the image +Trinity-RFT provides a dedicated Dockerfile for Megatron-LM located at `scripts/docker_for_megatron/Dockerfile`. You can build the image using the following command: + +```bash docker build -f scripts/docker_for_megatron/Dockerfile -t trinity-rft-megatron:latest . ``` diff --git a/docs/sphinx_doc/source_zh/main.md b/docs/sphinx_doc/source_zh/main.md index eb89d59d17..5acc3baf0e 100644 --- a/docs/sphinx_doc/source_zh/main.md +++ b/docs/sphinx_doc/source_zh/main.md @@ -1,20 +1,20 @@ ## 💡 什么是 Trinity-RFT? -Trinity-RFT 是一个灵活、通用的大语言模型(LLM)强化微调(RFT)框架。其提供三个独立模块,满足不同用户的需求: +Trinity-RFT 是一个灵活、通用的大语言模型(LLM)强化微调(RFT)框架。 其将 RFT 流程解耦为三个关键模块:**Explorer**、**Trainer** 和 **Buffer**,并面向不同背景和目标的用户提供相应功能: -* 🤖 **Explorer**:面向智能体应用开发者。[[教程]](/tutorial/develop_workflow.md) +* 🤖 面向智能体应用开发者。[[教程]](/tutorial/develop_workflow.md) - 训练智能体应用,以增强其在指定环境中完成任务的能力 - 示例:[多轮交互](/tutorial/example_multi_turn.md),[ReAct 智能体](/tutorial/example_react.md) -* 🧠 **Trainer**:面向 RL 算法研究者。[[教程]](/tutorial/develop_algorithm.md) +* 🧠 面向 RL 算法研究者。[[教程]](/tutorial/develop_algorithm.md) - 在简洁、可插拔的类中设计和验证新的 RL 算法 - - 示例:[混合 RL 算法](/tutorial/example_mix_algo.md) + - 示例:[SFT/GRPO混合算法](/tutorial/example_mix_algo.md) -* 🗄️ **Buffer**:面向数据工程师。[[教程]](/tutorial/develop_operator.md) +* 📊 面向数据工程师。[[教程]](/tutorial/develop_operator.md) - 设计任务定制数据集,构建数据流水线以支持清洗、增强和人类参与场景 - - 示例:[数据功能](/tutorial/example_data_functionalities.md) + - 示例:[数据处理](/tutorial/example_data_functionalities.md) -Trinity-RFT 统一了上述三个模块,并提供以下核心特性: +# 🌟 核心特性 * **灵活的 RFT 模式:** - 支持同步/异步、on-policy/off-policy 以及在线/离线训练。采样与训练可分离运行,并可在多设备上独立扩展。 diff --git a/docs/sphinx_doc/source_zh/tutorial/example_megatron.md b/docs/sphinx_doc/source_zh/tutorial/example_megatron.md index 1c34b27cb4..a31db2d9f0 100644 --- a/docs/sphinx_doc/source_zh/tutorial/example_megatron.md +++ b/docs/sphinx_doc/source_zh/tutorial/example_megatron.md @@ -4,84 +4,35 @@ 本指南将清晰地引导你如何使用 **Megatron-LM** 来训练模型。 ```{note} -本指南假设你已经按照 {ref}`安装指南 ` 设置好了环境。如果还没有,请先参考该指南。 +本指南假设你已经按照 {ref}`安装指南 ` 中的源码安装方式配置好了环境。如果还没有,请先参考该指南。 ``` --- -## 步骤 1:安装 +## 步骤 1:安装 Megatron-LM 支持 -### 最低要求 - -在开始之前,请确保你的系统满足以下要求: - -- **GPU**:至少 2 块 GPU(用于分布式训练) -- **CUDA**:版本 12.4 或更高 -- **Python**:版本 3.10 或更高 - ---- - -### 安装依赖项 - -首先克隆仓库并创建虚拟环境: - -```bash -# 克隆仓库 -git clone https://github.com/modelscope/Trinity-RFT -cd Trinity-RFT -``` - -#### 选项 A:使用 Conda - -```bash -# 创建并激活新环境 -conda create -n trinity python=3.10 -conda activate trinity -``` - -#### 选项 B:使用 venv - -```bash -# 创建并激活虚拟环境 -python3.10 -m venv .venv -source .venv/bin/activate -``` - -#### 安装包 - -以可编辑模式安装项目,并启用 Megatron 支持: - -```bash -# 针对 bash 用户 -pip install -e .[megatron] - -# 针对 zsh 用户(需转义括号) -pip install -e .\[megatron\] -``` - -#### 安装 Flash Attention - -安装基础依赖后,安装 `flash-attn`。编译过程可能需要几分钟,请耐心等待。 +安装 Megatron-LM 相关依赖: ```bash -pip install flash-attn==2.8.1 -v -``` +pip install -e ".[megatron]" -如果遇到安装问题,可尝试以下替代命令: - -```bash -pip install flash-attn -v --no-build-isolation +# for uv +# uv sync -extra megatron ``` -#### 安装 Apex(来自 NVIDIA) - -最后,安装 NVIDIA 的 Apex 库以支持混合精度训练: +另外还需要从源码安装 NVIDIA 的 Apex 库以支持混合精度训练: ```bash pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \ --config-settings "--build-option=--cpp_ext" \ --config-settings "--build-option=--cuda_ext" \ - --resume-retries 999 git+https://github.com/NVIDIA/apex.git + --resume-retries 10 git+https://github.com/NVIDIA/apex.git + +# for uv +# uv pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \ +# --config-settings "--build-option=--cpp_ext" \ +# --config-settings "--build-option=--cuda_ext" \ +# --resume-retries 10 git+https://github.com/NVIDIA/apex.git ``` --- @@ -92,11 +43,10 @@ pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \ #### 构建 Docker 镜像 -```bash -git clone https://github.com/modelscope/Trinity-RFT -cd Trinity-RFT +Trinity-RFT 提供了专门用于 Megatron-LM 的 Dockerfile,位于 `scripts/docker_for_megatron/Dockerfile`。 +可以使用以下命令构建镜像: -# 构建镜像 +```bash docker build -f scripts/docker_for_megatron/Dockerfile -t trinity-rft-megatron:latest . ```