diff --git a/README.md b/README.md index 5a7255ebb5..ac87102781 100644 --- a/README.md +++ b/README.md @@ -4,10 +4,8 @@ Trinity-RFT -

Trinity-RFT: A General-Purpose and Unified Framework for
Reinforcement Fine-Tuning of Large Language Models

-
[![paper](http://img.shields.io/badge/cs.LG-2505.17826-B31B1B?logo=arxiv&logoColor=red)](https://arxiv.org/abs/2505.17826) @@ -19,27 +17,19 @@ ## 💡 What is Trinity-RFT? - -Trinity-RFT is a general-purpose, flexible and user-friendly framework for LLM reinforcement fine-tuning (RFT). +Trinity-RFT is a general-purpose, flexible and user-friendly framework for LLM reinforcement fine-tuning (RFT). It decouples RFT into three components that work in coordination: * **Explorer** generates experience data via agent-environment interaction; - * **Trainer** updates model weights by minimizing losses on the data; - * **Buffer** pipelines data processing throughout the RFT lifecycle. - Trinity-RFT provides functionalities for users with different backgrounds and objectives: * 🤖 **Agent application developers:** Train LLM-powered agents and improve their capabilities in specific domains [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_workflow.html) - * 🧠 **Reinforcement learning researchers:** Design, implement and validate new RL algorithms using compact, plug-and-play modules that allow non-invasive customization [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_algorithm.html) - * 📊 **Data engineers:** Create RFT datasets and build data pipelines for cleaning, augmentation, and human-in-the-loop scenarios [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_operator.html) - - ## 🚀 News * [2025-12] [[Release Notes]](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.4.0) Trinity-RFT v0.4.0 released: added [Tinker](https://thinkingmachines.ai/tinker/) backend for users **without GPUs**, add more benchmarks, enhance online RL and more. @@ -64,82 +54,68 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob - ## 🔨 Tutorials and Guidelines - -| Category | Tutorial / Guideline | -| --- | ----| -| *Run diverse RFT modes* | + [Quick start: GRPO on GSM8k](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)
+ [Off-policy RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_advanced.html)
+ [Fully asynchronous RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_async_mode.html)
+ [Offline learning by DPO or SFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_dpo.html)
+ [RFT without local GPU (Tinker Backend)](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_tinker_backend.html) | -| *Multi-step agentic RL* | + [Concatenated multi-turn workflow](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_multi_turn.html)
+ [General multi-step workflow](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_step_wise.html)
+ [ReAct workflow with an agent framework](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_react.html)
+ [Example: train a web-search agent](https://github.com/modelscope/Trinity-RFT/tree/main/examples/agentscope_websearch) | -| *Full-lifecycle data pipelines* | + [Rollout task mixing and selection](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_selector.html)
+ [Online task curriculum](https://github.com/modelscope/Trinity-RFT/tree/main/examples/bots) (📝 [paper](https://arxiv.org/pdf/2510.26374))
+ [Research project: learn-to-ask](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask) (📝 [paper](https://arxiv.org/pdf/2510.25441))
+ [Experience replay with prioritization](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay)
+ [Advanced data processing & human-in-the-loop](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_data_functionalities.html) | -| *Algorithm development* | + [RL algorithm development with Trinity-RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html) (📝 [paper](https://arxiv.org/pdf/2508.11408))
+ [Research project: group-relative REINFORCE](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k) (📝 [paper](https://arxiv.org/abs/2509.24203))
+ Non-verifiable domains: [RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler), [trainable RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler), [rubric-as-reward](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward) | -| *Benchmarks* | + [Benchmark toolkit (quick verification & experimentation)](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/README.md)
+ [Guru-Math benchmark & comparison with veRL](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/guru_math.md)
+ [FrozenLake benchmark & comparison with rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/frozenlake.md)
+ [Alfworld benchmark & comparison with rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/alfworld.md) | -| *Going deeper into Trinity-RFT* | + [Full configurations](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html)
+ [GPU resource and training configuration guide](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html)
+ [Understand the coordination between explorer and trainer](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/synchronizer.html)
+ [How to align configuration with veRL](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/align_with_verl.html) | - +| Category | Tutorial / Guideline | +|-----------------------------------|------------------------------------------------------------------------------------------------------------------| +| *Run diverse RFT modes* | • [Quick start: GRPO on GSM8k](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)
• [Off-policy RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_advanced.html)
• [Fully asynchronous RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_async_mode.html)
• [Offline learning by DPO or SFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_dpo.html)
• [RFT without local GPU (Tinker Backend)](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_tinker_backend.html) | +| *Multi-step agentic RL* | • [Concatenated multi-turn workflow](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_multi_turn.html)
• [General multi-step workflow](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_step_wise.html)
• [ReAct workflow with an agent framework](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_react.html)
• [Example: train a web-search agent](https://github.com/modelscope/Trinity-RFT/tree/main/examples/agentscope_websearch) | +| *Full-lifecycle data pipelines* | • [Rollout task mixing and selection](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_selector.html)
• [Online task curriculum](https://github.com/modelscope/Trinity-RFT/tree/main/examples/bots) (📝 [paper](https://arxiv.org/pdf/2510.26374))
• [Research project: learn-to-ask](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask) (📝 [paper](https://arxiv.org/pdf/2510.25441))
• [Experience replay with prioritization](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay)
• [Advanced data processing & human-in-the-loop](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_data_functionalities.html) | +| *Algorithm development* | • [RL algorithm development with Trinity-RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html) (📝 [paper](https://arxiv.org/pdf/2508.11408))
• [Research project: group-relative REINFORCE](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k) (📝 [paper](https://arxiv.org/abs/2509.24203))
• Non-verifiable domains: [RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler), [trainable RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler), [rubric-as-reward](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward) | +| *Benchmarks* | • [Benchmark toolkit (quick verification & experimentation)](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/README.md)
• [Guru-Math benchmark & comparison with veRL](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/guru_math.md)
• [FrozenLake benchmark & comparison with rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/frozenlake.md)
• [Alfworld benchmark & comparison with rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/alfworld.md) | +| *Going deeper into Trinity-RFT* | • [Full configurations](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html)
• [GPU resource and training configuration guide](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html)
• [Understand the coordination between explorer and trainer](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/synchronizer.html)
• [How to align configuration with veRL](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/align_with_verl.html) | > [!NOTE] > For more tutorials, please refer to the [Trinity-RFT documentation](https://modelscope.github.io/Trinity-RFT/). - - ## 🌟 Key Features * **Flexible RFT Modes:** - Supports synchronous/asynchronous, on-policy/off-policy, and online/offline RL. - Rollout and training can run separately and scale independently across devices. - Boost sample and time efficiency by experience replay. - RFT modes supported by Trinity-RFT * **Agentic RL Support:** - Supports both concatenated and general multi-step agentic workflows. - Able to directly train agent applications developed using agent frameworks like [AgentScope](https://github.com/agentscope-ai/agentscope). - Agentic workflows * **Full-Lifecycle Data Pipelines:** - Enables pipeline processing of rollout tasks and experience samples. - Active data management (prioritization, cleaning, augmentation, etc.) throughout the RFT lifecycle. - Native support for multi-task joint learning and online task curriculum construction. - Data pipeline design * **User-Friendly Design:** - Plug-and-play modules and decoupled architecture, facilitating easy adoption and development. - Rich graphical user interfaces enable low-code usage. - System architecture - - ## 🔧 Supported Algorithms -We list some algorithms supported by Trinity-RFT in the following table. For more details, the concrete configurations are shown in the [Algorithm module](https://github.com/modelscope/Trinity-RFT/blob/main/trinity/algorithm/algorithm.py). You can also set up new algorithms by customizing different components, see [tutorial](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_algorithm.html). - -| Algorithm | Doc / Example | Source Code | Key Configurations | -|:-----------|:-----------|:---------------|:-----------| -| PPO [[Paper](https://arxiv.org/pdf/1707.06347)] | [[Doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)] [[Countdown Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/ppo_policy_loss.py)] | `algorithm_type: ppo` | -| GRPO [[Paper](https://arxiv.org/pdf/2402.03300)] | [[Doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)] [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k)]| [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/grpo_advantage.py)] | `algorithm_type: grpo` | -| CHORD 💡 [[Paper](https://arxiv.org/pdf/2508.11408)] | [[Doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html)] [[ToolACE Example](https://github.com/modelscope/Trinity-RFT/blob/main/examples/mix_chord/mix_chord_toolace.yaml)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/chord_policy_loss.py)] | `algorithm_type: mix_chord` | -| REC Series 💡 [[Paper](https://arxiv.org/pdf/2509.24203)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/rec_policy_loss.py)] | `algorithm_type: rec` | -| RLOO [[Paper](https://arxiv.org/pdf/2402.14740)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/rloo_advantage.py)] | `algorithm_type: rloo` | -| REINFORCE++ [[Paper](https://arxiv.org/pdf/2501.03262)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/reinforce_advantage.py)] | `algorithm_type: reinforceplusplus` | -| GSPO [[Paper](https://arxiv.org/pdf/2507.18071)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/gspo_policy_loss.py)] | `algorithm_type: gspo` | -| TOPR [[Paper](https://arxiv.org/pdf/2503.14286)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/topr_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/topr_policy_loss.py)] | `algorithm_type: topr` | -| sPPO [[Paper](https://arxiv.org/pdf/2108.05828)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/sppo_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/sppo_loss_fn.py)] | `algorithm_type: sppo` | -| AsymRE [[Paper](https://arxiv.org/pdf/2506.20520)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/asymre_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/asymre_advantage.py)] | `algorithm_type: asymre` | -| CISPO [[Paper](https://arxiv.org/pdf/2506.13585)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/cispo_policy_loss.py)] | `algorithm_type: cispo` | -| SAPO [[Paper](https://arxiv.org/pdf/2511.20347)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/sapo_policy_loss.py)] | `algorithm_type: sapo` | +| Algorithm | Doc / Example | Source Code | Key Configurations | +|------------------------|-------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------|--------------------------------| +| PPO [[Paper](https://arxiv.org/pdf/1707.06347)] | [[Doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)] [[Countdown Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/ppo_policy_loss.py)] | `algorithm_type: ppo` | +| GRPO [[Paper](https://arxiv.org/pdf/2402.03300)] | [[Doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)] [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/grpo_advantage.py)] | `algorithm_type: grpo` | +| CHORD 💡 [[Paper](https://arxiv.org/pdf/2508.11408)] | [[Doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html)] [[ToolACE Example](https://github.com/modelscope/Trinity-RFT/blob/main/examples/mix_chord/mix_chord_toolace.yaml)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/chord_policy_loss.py)] | `algorithm_type: mix_chord` | +| REC Series 💡 [[Paper](https://arxiv.org/pdf/2509.24203)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/rec_policy_loss.py)] | `algorithm_type: rec` | +| RLOO [[Paper](https://arxiv.org/pdf/2402.14740)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/rloo_advantage.py)] | `algorithm_type: rloo` | +| REINFORCE++ [[Paper](https://arxiv.org/pdf/2501.03262)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/reinforce_advantage.py)] | `algorithm_type: reinforceplusplus` | +| GSPO [[Paper](https://arxiv.org/pdf/2507.18071)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/gspo_policy_loss.py)] | `algorithm_type: gspo` | +| TOPR [[Paper](https://arxiv.org/pdf/2503.14286)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/topr_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/topr_policy_loss.py)] | `algorithm_type: topr` | +| sPPO [[Paper](https://arxiv.org/pdf/2108.05828)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/sppo_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/sppo_loss_fn.py)] | `algorithm_type: sppo` | +| AsymRE [[Paper](https://arxiv.org/pdf/2506.20520)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/asymre_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/asymre_advantage.py)] | `algorithm_type: asymre` | +| CISPO [[Paper](https://arxiv.org/pdf/2506.13585)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/cispo_policy_loss.py)] | `algorithm_type: cispo` | +| SAPO [[Paper](https://arxiv.org/pdf/2511.20347)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/sapo_policy_loss.py)] | `algorithm_type: sapo` | | On-Policy Distillation [[Blog](https://thinkingmachines.ai/blog/on-policy-distillation/)] [[Paper](https://arxiv.org/pdf/2306.13649)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/on_policy_distill)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/common/workflows/on_policy_distill_workflow.py)] | `algorithm_type: on_policy_distill` | - - --- ## Table of Contents - [Quick Start](#quick-start) + - [Minimal CPU-Only Quick Start](#minimal-cpu-only-quick-start) - [Step 1: installation](#step-1-installation) - [Step 2: prepare dataset and model](#step-2-prepare-dataset-and-model) - [Step 3: configurations](#step-3-configurations) @@ -148,51 +124,72 @@ We list some algorithms supported by Trinity-RFT in the following table. For mor - [Acknowledgements](#acknowledgements) - [Citation](#citation) - +--- ## Quick Start - > [!NOTE] > This project is currently under active development. Comments and suggestions are welcome! -> -> **No GPU? No problem!** You can still try it out: -> 1. Follow the installation steps (feel free to skip GPU-specific packages like `flash-attn`) -> 2. Run the **[Tinker training example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/tinker)**, which is specifically designed to work on CPU-only systems. +> **No GPU? No problem!** You can still try it out using the Tinker backend: +> 1. Follow the installation steps (skip GPU-specific packages like `flash-attn`) +> 2. Run the **[Tinker training example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/tinker)**, which is designed for CPU-only systems. + +### Minimal CPU-Only Quick Start + +If you do not have access to a GPU, you can still try Trinity-RFT using the Tinker backend. + +```bash +# Create and activate environment +python3 -m venv .venv +source .venv/bin/activate + +# Install Trinity-RFT with CPU-only backend +pip install -e ".[tinker]" +``` + +Run a simple example: + +```bash +trinity run --config examples/tinker/tinker_example.yaml +``` + +This example is designed to run on CPU-only machines and is recommended for first-time users. ### Step 1: installation Before installing, make sure your system meets the following requirements: -- **Python**: version 3.10 to 3.12 (inclusive) -- **CUDA**: version >= 12.8 -- **GPUs**: at least 2 GPUs +* Python: version 3.10 to 3.12 (inclusive) +* CUDA: version >= 12.8 (required for GPU training) +* GPUs: at least 2 GPUs (for standard distributed training) +* CPU-only: Supported via the Tinker backend (see Minimal CPU-Only Quick Start) + +**Recommended for first-time users:** +* If you have no GPU → Use Tinker backend +* If you want simple setup → Use Docker +* If you want development & contribution → Use Conda / venv #### From Source (Recommended) If you plan to customize or contribute to Trinity-RFT, this is the best option. -##### 1. Clone the Repository +1. Clone the Repository ```bash git clone https://github.com/modelscope/Trinity-RFT cd Trinity-RFT ``` -##### 2. Set Up Environment +2. Set Up Environment + Choose one of the following options: -Choose one of the following options: - -##### Using Pre-built Docker Image (Recommended for Beginners) - -We provide a pre-built Docker image with GPU-related dependencies installed. +**Using Pre-built Docker Image (Recommended for Beginners)** ```bash docker pull ghcr.io/modelscope/trinity-rft:latest -# Run the container, replacing with your actual path docker run -it \ --gpus all \ --shm-size="64g" \ @@ -201,10 +198,8 @@ docker run -it \ -v :/data \ ghcr.io/modelscope/trinity-rft:latest ``` - > This image has used `uv` to install all GPU-related dependencies of Trinity-RFT. The virtual environment will be automatically activated upon entering the container (you can also manually activate it via `source /opt/venv/bin/activate` if needed). You can use `uv pip install` to add extra packages as necessary. - -###### Using Conda +**Using Conda** ```bash conda create -n trinity python=3.12 @@ -212,16 +207,15 @@ conda activate trinity pip install -e ".[vllm,flash_attn]" -# If you have no GPU, comment out the line above and uncomment this instead: +# If you have no GPU: # pip install -e ".[tinker]" # If you encounter issues when installing flash-attn, try: # pip install flash-attn==2.8.1 --no-build-isolation - pip install -e ".[dev]" # for development like linting and debugging ``` -###### Using venv +**Using venv** ```bash python3.10 -m venv .venv @@ -229,134 +223,82 @@ source .venv/bin/activate pip install -e ".[vllm,flash_attn]" -# If you have no GPU, comment out the line above and uncomment this instead: +# If you have no GPU: # pip install -e ".[tinker]" -# If you encounter issues when installing flash-attn, try: -# pip install flash-attn==2.8.1 --no-build-isolation - -pip install -e ".[dev]" # for development like linting and debugging +pip install -e ".[dev]" ``` -###### Using `uv` - -[`uv`](https://github.com/astral-sh/uv) is a modern Python package installer. +**Using uv** ```bash uv sync --extra vllm --extra dev --extra flash_attn -# If you have no GPU, try to use Tinker instead: +# If you have no GPU: # uv sync --extra tinker --extra dev ``` #### Via PyPI -If you just want to use the package without modifying the code: - ```bash pip install trinity-rft pip install flash-attn==2.8.1 ``` - -Or with `uv`: - -```bash -uv pip install trinity-rft -uv pip install flash-attn==2.8.1 -``` - > For training with **Megatron-LM**, please refer to [Megatron-LM Backend](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_megatron.html). - ### Step 2: prepare dataset and model - Trinity-RFT supports most datasets and models from Huggingface and ModelScope. - **Prepare the model** in the local directory `$MODEL_PATH/{model_name}`: ```bash # Using Huggingface huggingface-cli download {model_name} --local-dir $MODEL_PATH/{model_name} - # Using Modelscope modelscope download {model_name} --local_dir $MODEL_PATH/{model_name} ``` -For more details about model downloading, see [Huggingface](https://huggingface.co/docs/huggingface_hub/main/en/guides/cli) or [ModelScope](https://modelscope.cn/docs/models/download). - - - **Prepare the dataset** in the local directory `$DATASET_PATH/{dataset_name}`: ```bash # Using Huggingface huggingface-cli download {dataset_name} --repo-type dataset --local-dir $DATASET_PATH/{dataset_name} - # Using Modelscope modelscope download --dataset {dataset_name} --local_dir $DATASET_PATH/{dataset_name} ``` -For more details about dataset downloading, see [Huggingface](https://huggingface.co/docs/huggingface_hub/main/en/guides/cli#download-a-dataset-or-a-space) or [ModelScope](https://modelscope.cn/docs/datasets/download). - - - ### Step 3: configurations - Trinity-RFT provides a web interface for configuring your RFT process. > [!NOTE] > This is an experimental feature, and we will continue to improve it. - To launch the web interface for minimal configurations, you can run ```bash trinity studio --port 8080 ``` -Then you can configure your RFT process in the web page and generate a config file. You can save the config file for later use or run it directly as described in the following section. +Then you can configure your RFT process in the web page and generate a config file. -Advanced users can also edit the config file directly. +Advanced users can also edit the config file directly. We provide example config files in [`examples`](examples/). For complete GUI features, please refer to the monorepo for [Trinity-Studio](https://github.com/modelscope/Trinity-Studio). - -
- - Example: config manager GUI - -![config-manager](https://img.alicdn.com/imgextra/i1/O1CN01yhYrV01lGKchtywSH_!!6000000004791-2-tps-1480-844.png) - - -
- - - - ### Step 4: run the RFT process - Start a ray cluster: ```shell # On master node ray start --head - # On worker nodes ray start --address= ``` -(Optional) You may use [Wandb](https://docs.wandb.ai/quickstart/) / [TensorBoard](https://www.tensorflow.org/tensorboard) / [MLFlow](https://mlflow.org) for better monitoring. -Please refer to [this documentation](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html#monitor-configuration) for the corresponding configurations. -For example, to log in to Wandb: - -```shell -export WANDB_API_KEY= -wandb login -``` +(Optional) You may use [Wandb](https://docs.wandb.ai/quickstart/) / [TensorBoard](https://www.tensorflow.org/tensorboard) / [MLFlow](https://mlflow.org) for better monitoring. Please refer to [this documentation](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html#monitor-configuration) for the corresponding configurations. For command-line users, run the RFT process: @@ -364,35 +306,38 @@ For command-line users, run the RFT process: trinity run --config ``` -For example, below is the command for fine-tuning Qwen2.5-1.5B-Instruct on GSM8k with GRPO: +Example — fine-tuning Qwen2.5-1.5B-Instruct on GSM8k with GRPO: -```shell +```bash trinity run --config examples/grpo_gsm8k/gsm8k.yaml ``` -For studio users, click "Run" in the web interface. - - +--- ## Contribution Guide This project is currently under active development, and we welcome contributions from the community! -See [CONTRIBUTING.md](./CONTRIBUTING.md) for detailed contribution guidelines. +We welcome contributions of all kinds, including: + +* Documentation improvements +* Example workflows +* Bug fixes and performance optimizations +If you're new to the project, documentation and example updates are a great place to start. + +See [CONTRIBUTING.md](./CONTRIBUTING.md) for detailed contribution guidelines. ## Acknowledgements This project is built upon many excellent open-source projects, including: -+ [verl](https://github.com/volcengine/verl), [FSDP](https://pytorch.org/docs/stable/fsdp.html) and [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) for LLM training; -+ [vLLM](https://github.com/vllm-project/vllm) for LLM inference; -+ [Data-Juicer](https://github.com/modelscope/data-juicer?tab=readme-ov-file) for data processing pipelines; -+ [AgentScope](https://github.com/agentscope-ai/agentscope) for agentic workflow; -+ [Ray](https://github.com/ray-project/ray) for distributed systems; -+ we have also drawn inspirations from RL frameworks like [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [TRL](https://github.com/huggingface/trl), [ChatLearn](https://github.com/alibaba/ChatLearn) and [rLLM](https://github.com/rllm-org/rllm); -+ ...... - +* [verl](https://github.com/volcengine/verl), [FSDP](https://pytorch.org/docs/stable/fsdp.html) and [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) for LLM training +* [vLLM](https://github.com/vllm-project/vllm) for LLM inference +* [Data-Juicer](https://github.com/modelscope/data-juicer?tab=readme-ov-file) for data processing pipelines +* [AgentScope](https://github.com/agentscope-ai/agentscope) for agentic workflow +* [Ray](https://github.com/ray-project/ray) for distributed systems +* RL frameworks like [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [TRL](https://github.com/huggingface/trl), [ChatLearn](https://github.com/alibaba/ChatLearn) and [rLLM](https://github.com/rllm-org/rllm) ## Citation