diff --git a/README.md b/README.md
index 5a7255ebb5..ac87102781 100644
--- a/README.md
+++ b/README.md
@@ -4,10 +4,8 @@
-
[](https://arxiv.org/abs/2505.17826)
@@ -19,27 +17,19 @@
## 💡 What is Trinity-RFT?
-
-Trinity-RFT is a general-purpose, flexible and user-friendly framework for LLM reinforcement fine-tuning (RFT).
+Trinity-RFT is a general-purpose, flexible and user-friendly framework for LLM reinforcement fine-tuning (RFT).
It decouples RFT into three components that work in coordination:
* **Explorer** generates experience data via agent-environment interaction;
-
* **Trainer** updates model weights by minimizing losses on the data;
-
* **Buffer** pipelines data processing throughout the RFT lifecycle.
-
Trinity-RFT provides functionalities for users with different backgrounds and objectives:
* 🤖 **Agent application developers:** Train LLM-powered agents and improve their capabilities in specific domains [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_workflow.html)
-
* 🧠 **Reinforcement learning researchers:** Design, implement and validate new RL algorithms using compact, plug-and-play modules that allow non-invasive customization [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_algorithm.html)
-
* 📊 **Data engineers:** Create RFT datasets and build data pipelines for cleaning, augmentation, and human-in-the-loop scenarios [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_operator.html)
-
-
## 🚀 News
* [2025-12] [[Release Notes]](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.4.0) Trinity-RFT v0.4.0 released: added [Tinker](https://thinkingmachines.ai/tinker/) backend for users **without GPUs**, add more benchmarks, enhance online RL and more.
@@ -64,82 +54,68 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob
-
## 🔨 Tutorials and Guidelines
-
-| Category | Tutorial / Guideline |
-| --- | ----|
-| *Run diverse RFT modes* | + [Quick start: GRPO on GSM8k](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)
+ [Off-policy RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_advanced.html)
+ [Fully asynchronous RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_async_mode.html)
+ [Offline learning by DPO or SFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_dpo.html)
+ [RFT without local GPU (Tinker Backend)](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_tinker_backend.html) |
-| *Multi-step agentic RL* | + [Concatenated multi-turn workflow](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_multi_turn.html)
+ [General multi-step workflow](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_step_wise.html)
+ [ReAct workflow with an agent framework](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_react.html)
+ [Example: train a web-search agent](https://github.com/modelscope/Trinity-RFT/tree/main/examples/agentscope_websearch) |
-| *Full-lifecycle data pipelines* | + [Rollout task mixing and selection](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_selector.html)
+ [Online task curriculum](https://github.com/modelscope/Trinity-RFT/tree/main/examples/bots) (📝 [paper](https://arxiv.org/pdf/2510.26374))
+ [Research project: learn-to-ask](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask) (📝 [paper](https://arxiv.org/pdf/2510.25441))
+ [Experience replay with prioritization](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay)
+ [Advanced data processing & human-in-the-loop](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_data_functionalities.html) |
-| *Algorithm development* | + [RL algorithm development with Trinity-RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html) (📝 [paper](https://arxiv.org/pdf/2508.11408))
+ [Research project: group-relative REINFORCE](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k) (📝 [paper](https://arxiv.org/abs/2509.24203))
+ Non-verifiable domains: [RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler), [trainable RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler), [rubric-as-reward](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward) |
-| *Benchmarks* | + [Benchmark toolkit (quick verification & experimentation)](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/README.md)
+ [Guru-Math benchmark & comparison with veRL](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/guru_math.md)
+ [FrozenLake benchmark & comparison with rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/frozenlake.md)
+ [Alfworld benchmark & comparison with rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/alfworld.md) |
-| *Going deeper into Trinity-RFT* | + [Full configurations](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html)
+ [GPU resource and training configuration guide](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html)
+ [Understand the coordination between explorer and trainer](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/synchronizer.html)
+ [How to align configuration with veRL](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/align_with_verl.html) |
-
+| Category | Tutorial / Guideline |
+|-----------------------------------|------------------------------------------------------------------------------------------------------------------|
+| *Run diverse RFT modes* | • [Quick start: GRPO on GSM8k](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)
• [Off-policy RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_advanced.html)
• [Fully asynchronous RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_async_mode.html)
• [Offline learning by DPO or SFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_dpo.html)
• [RFT without local GPU (Tinker Backend)](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_tinker_backend.html) |
+| *Multi-step agentic RL* | • [Concatenated multi-turn workflow](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_multi_turn.html)
• [General multi-step workflow](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_step_wise.html)
• [ReAct workflow with an agent framework](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_react.html)
• [Example: train a web-search agent](https://github.com/modelscope/Trinity-RFT/tree/main/examples/agentscope_websearch) |
+| *Full-lifecycle data pipelines* | • [Rollout task mixing and selection](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_selector.html)
• [Online task curriculum](https://github.com/modelscope/Trinity-RFT/tree/main/examples/bots) (📝 [paper](https://arxiv.org/pdf/2510.26374))
• [Research project: learn-to-ask](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask) (📝 [paper](https://arxiv.org/pdf/2510.25441))
• [Experience replay with prioritization](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay)
• [Advanced data processing & human-in-the-loop](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_data_functionalities.html) |
+| *Algorithm development* | • [RL algorithm development with Trinity-RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html) (📝 [paper](https://arxiv.org/pdf/2508.11408))
• [Research project: group-relative REINFORCE](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k) (📝 [paper](https://arxiv.org/abs/2509.24203))
• Non-verifiable domains: [RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler), [trainable RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler), [rubric-as-reward](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward) |
+| *Benchmarks* | • [Benchmark toolkit (quick verification & experimentation)](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/README.md)
• [Guru-Math benchmark & comparison with veRL](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/guru_math.md)
• [FrozenLake benchmark & comparison with rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/frozenlake.md)
• [Alfworld benchmark & comparison with rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/alfworld.md) |
+| *Going deeper into Trinity-RFT* | • [Full configurations](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html)
• [GPU resource and training configuration guide](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html)
• [Understand the coordination between explorer and trainer](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/synchronizer.html)
• [How to align configuration with veRL](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/align_with_verl.html) |
> [!NOTE]
> For more tutorials, please refer to the [Trinity-RFT documentation](https://modelscope.github.io/Trinity-RFT/).
-
-
## 🌟 Key Features
* **Flexible RFT Modes:**
- Supports synchronous/asynchronous, on-policy/off-policy, and online/offline RL.
- Rollout and training can run separately and scale independently across devices.
- Boost sample and time efficiency by experience replay.
-

* **Agentic RL Support:**
- Supports both concatenated and general multi-step agentic workflows.
- Able to directly train agent applications developed using agent frameworks like [AgentScope](https://github.com/agentscope-ai/agentscope).
-

* **Full-Lifecycle Data Pipelines:**
- Enables pipeline processing of rollout tasks and experience samples.
- Active data management (prioritization, cleaning, augmentation, etc.) throughout the RFT lifecycle.
- Native support for multi-task joint learning and online task curriculum construction.
-

* **User-Friendly Design:**
- Plug-and-play modules and decoupled architecture, facilitating easy adoption and development.
- Rich graphical user interfaces enable low-code usage.
-

-
-
## 🔧 Supported Algorithms
-We list some algorithms supported by Trinity-RFT in the following table. For more details, the concrete configurations are shown in the [Algorithm module](https://github.com/modelscope/Trinity-RFT/blob/main/trinity/algorithm/algorithm.py). You can also set up new algorithms by customizing different components, see [tutorial](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_algorithm.html).
-
-| Algorithm | Doc / Example | Source Code | Key Configurations |
-|:-----------|:-----------|:---------------|:-----------|
-| PPO [[Paper](https://arxiv.org/pdf/1707.06347)] | [[Doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)] [[Countdown Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/ppo_policy_loss.py)] | `algorithm_type: ppo` |
-| GRPO [[Paper](https://arxiv.org/pdf/2402.03300)] | [[Doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)] [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k)]| [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/grpo_advantage.py)] | `algorithm_type: grpo` |
-| CHORD 💡 [[Paper](https://arxiv.org/pdf/2508.11408)] | [[Doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html)] [[ToolACE Example](https://github.com/modelscope/Trinity-RFT/blob/main/examples/mix_chord/mix_chord_toolace.yaml)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/chord_policy_loss.py)] | `algorithm_type: mix_chord` |
-| REC Series 💡 [[Paper](https://arxiv.org/pdf/2509.24203)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/rec_policy_loss.py)] | `algorithm_type: rec` |
-| RLOO [[Paper](https://arxiv.org/pdf/2402.14740)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/rloo_advantage.py)] | `algorithm_type: rloo` |
-| REINFORCE++ [[Paper](https://arxiv.org/pdf/2501.03262)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/reinforce_advantage.py)] | `algorithm_type: reinforceplusplus` |
-| GSPO [[Paper](https://arxiv.org/pdf/2507.18071)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/gspo_policy_loss.py)] | `algorithm_type: gspo` |
-| TOPR [[Paper](https://arxiv.org/pdf/2503.14286)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/topr_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/topr_policy_loss.py)] | `algorithm_type: topr` |
-| sPPO [[Paper](https://arxiv.org/pdf/2108.05828)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/sppo_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/sppo_loss_fn.py)] | `algorithm_type: sppo` |
-| AsymRE [[Paper](https://arxiv.org/pdf/2506.20520)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/asymre_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/asymre_advantage.py)] | `algorithm_type: asymre` |
-| CISPO [[Paper](https://arxiv.org/pdf/2506.13585)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/cispo_policy_loss.py)] | `algorithm_type: cispo` |
-| SAPO [[Paper](https://arxiv.org/pdf/2511.20347)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/sapo_policy_loss.py)] | `algorithm_type: sapo` |
+| Algorithm | Doc / Example | Source Code | Key Configurations |
+|------------------------|-------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------|--------------------------------|
+| PPO [[Paper](https://arxiv.org/pdf/1707.06347)] | [[Doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)] [[Countdown Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/ppo_policy_loss.py)] | `algorithm_type: ppo` |
+| GRPO [[Paper](https://arxiv.org/pdf/2402.03300)] | [[Doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)] [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/grpo_advantage.py)] | `algorithm_type: grpo` |
+| CHORD 💡 [[Paper](https://arxiv.org/pdf/2508.11408)] | [[Doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html)] [[ToolACE Example](https://github.com/modelscope/Trinity-RFT/blob/main/examples/mix_chord/mix_chord_toolace.yaml)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/chord_policy_loss.py)] | `algorithm_type: mix_chord` |
+| REC Series 💡 [[Paper](https://arxiv.org/pdf/2509.24203)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/rec_policy_loss.py)] | `algorithm_type: rec` |
+| RLOO [[Paper](https://arxiv.org/pdf/2402.14740)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/rloo_advantage.py)] | `algorithm_type: rloo` |
+| REINFORCE++ [[Paper](https://arxiv.org/pdf/2501.03262)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/reinforce_advantage.py)] | `algorithm_type: reinforceplusplus` |
+| GSPO [[Paper](https://arxiv.org/pdf/2507.18071)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/gspo_policy_loss.py)] | `algorithm_type: gspo` |
+| TOPR [[Paper](https://arxiv.org/pdf/2503.14286)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/topr_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/topr_policy_loss.py)] | `algorithm_type: topr` |
+| sPPO [[Paper](https://arxiv.org/pdf/2108.05828)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/sppo_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/sppo_loss_fn.py)] | `algorithm_type: sppo` |
+| AsymRE [[Paper](https://arxiv.org/pdf/2506.20520)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/asymre_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/asymre_advantage.py)] | `algorithm_type: asymre` |
+| CISPO [[Paper](https://arxiv.org/pdf/2506.13585)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/cispo_policy_loss.py)] | `algorithm_type: cispo` |
+| SAPO [[Paper](https://arxiv.org/pdf/2511.20347)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/sapo_policy_loss.py)] | `algorithm_type: sapo` |
| On-Policy Distillation [[Blog](https://thinkingmachines.ai/blog/on-policy-distillation/)] [[Paper](https://arxiv.org/pdf/2306.13649)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/on_policy_distill)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/common/workflows/on_policy_distill_workflow.py)] | `algorithm_type: on_policy_distill` |
-
-
---
## Table of Contents
- [Quick Start](#quick-start)
+ - [Minimal CPU-Only Quick Start](#minimal-cpu-only-quick-start)
- [Step 1: installation](#step-1-installation)
- [Step 2: prepare dataset and model](#step-2-prepare-dataset-and-model)
- [Step 3: configurations](#step-3-configurations)
@@ -148,51 +124,72 @@ We list some algorithms supported by Trinity-RFT in the following table. For mor
- [Acknowledgements](#acknowledgements)
- [Citation](#citation)
-
+---
## Quick Start
-
> [!NOTE]
> This project is currently under active development. Comments and suggestions are welcome!
->
-> **No GPU? No problem!** You can still try it out:
-> 1. Follow the installation steps (feel free to skip GPU-specific packages like `flash-attn`)
-> 2. Run the **[Tinker training example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/tinker)**, which is specifically designed to work on CPU-only systems.
+> **No GPU? No problem!** You can still try it out using the Tinker backend:
+> 1. Follow the installation steps (skip GPU-specific packages like `flash-attn`)
+> 2. Run the **[Tinker training example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/tinker)**, which is designed for CPU-only systems.
+
+### Minimal CPU-Only Quick Start
+
+If you do not have access to a GPU, you can still try Trinity-RFT using the Tinker backend.
+
+```bash
+# Create and activate environment
+python3 -m venv .venv
+source .venv/bin/activate
+
+# Install Trinity-RFT with CPU-only backend
+pip install -e ".[tinker]"
+```
+
+Run a simple example:
+
+```bash
+trinity run --config examples/tinker/tinker_example.yaml
+```
+
+This example is designed to run on CPU-only machines and is recommended for first-time users.
### Step 1: installation
Before installing, make sure your system meets the following requirements:
-- **Python**: version 3.10 to 3.12 (inclusive)
-- **CUDA**: version >= 12.8
-- **GPUs**: at least 2 GPUs
+* Python: version 3.10 to 3.12 (inclusive)
+* CUDA: version >= 12.8 (required for GPU training)
+* GPUs: at least 2 GPUs (for standard distributed training)
+* CPU-only: Supported via the Tinker backend (see Minimal CPU-Only Quick Start)
+
+**Recommended for first-time users:**
+* If you have no GPU → Use Tinker backend
+* If you want simple setup → Use Docker
+* If you want development & contribution → Use Conda / venv
#### From Source (Recommended)
If you plan to customize or contribute to Trinity-RFT, this is the best option.
-##### 1. Clone the Repository
+1. Clone the Repository
```bash
git clone https://github.com/modelscope/Trinity-RFT
cd Trinity-RFT
```
-##### 2. Set Up Environment
+2. Set Up Environment
+ Choose one of the following options:
-Choose one of the following options:
-
-##### Using Pre-built Docker Image (Recommended for Beginners)
-
-We provide a pre-built Docker image with GPU-related dependencies installed.
+**Using Pre-built Docker Image (Recommended for Beginners)**
```bash
docker pull ghcr.io/modelscope/trinity-rft:latest
-# Run the container, replacing
with your actual path
docker run -it \
--gpus all \
--shm-size="64g" \
@@ -201,10 +198,8 @@ docker run -it \
-v :/data \
ghcr.io/modelscope/trinity-rft:latest
```
-
> This image has used `uv` to install all GPU-related dependencies of Trinity-RFT. The virtual environment will be automatically activated upon entering the container (you can also manually activate it via `source /opt/venv/bin/activate` if needed). You can use `uv pip install` to add extra packages as necessary.
-
-###### Using Conda
+**Using Conda**
```bash
conda create -n trinity python=3.12
@@ -212,16 +207,15 @@ conda activate trinity
pip install -e ".[vllm,flash_attn]"
-# If you have no GPU, comment out the line above and uncomment this instead:
+# If you have no GPU:
# pip install -e ".[tinker]"
# If you encounter issues when installing flash-attn, try:
# pip install flash-attn==2.8.1 --no-build-isolation
-
pip install -e ".[dev]" # for development like linting and debugging
```
-###### Using venv
+**Using venv**
```bash
python3.10 -m venv .venv
@@ -229,134 +223,82 @@ source .venv/bin/activate
pip install -e ".[vllm,flash_attn]"
-# If you have no GPU, comment out the line above and uncomment this instead:
+# If you have no GPU:
# pip install -e ".[tinker]"
-# If you encounter issues when installing flash-attn, try:
-# pip install flash-attn==2.8.1 --no-build-isolation
-
-pip install -e ".[dev]" # for development like linting and debugging
+pip install -e ".[dev]"
```
-###### Using `uv`
-
-[`uv`](https://github.com/astral-sh/uv) is a modern Python package installer.
+**Using uv**
```bash
uv sync --extra vllm --extra dev --extra flash_attn
-# If you have no GPU, try to use Tinker instead:
+# If you have no GPU:
# uv sync --extra tinker --extra dev
```
#### Via PyPI
-If you just want to use the package without modifying the code:
-
```bash
pip install trinity-rft
pip install flash-attn==2.8.1
```
-
-Or with `uv`:
-
-```bash
-uv pip install trinity-rft
-uv pip install flash-attn==2.8.1
-```
-
> For training with **Megatron-LM**, please refer to [Megatron-LM Backend](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_megatron.html).
-
### Step 2: prepare dataset and model
-
Trinity-RFT supports most datasets and models from Huggingface and ModelScope.
-
**Prepare the model** in the local directory `$MODEL_PATH/{model_name}`:
```bash
# Using Huggingface
huggingface-cli download {model_name} --local-dir $MODEL_PATH/{model_name}
-
# Using Modelscope
modelscope download {model_name} --local_dir $MODEL_PATH/{model_name}
```
-For more details about model downloading, see [Huggingface](https://huggingface.co/docs/huggingface_hub/main/en/guides/cli) or [ModelScope](https://modelscope.cn/docs/models/download).
-
-
-
**Prepare the dataset** in the local directory `$DATASET_PATH/{dataset_name}`:
```bash
# Using Huggingface
huggingface-cli download {dataset_name} --repo-type dataset --local-dir $DATASET_PATH/{dataset_name}
-
# Using Modelscope
modelscope download --dataset {dataset_name} --local_dir $DATASET_PATH/{dataset_name}
```
-For more details about dataset downloading, see [Huggingface](https://huggingface.co/docs/huggingface_hub/main/en/guides/cli#download-a-dataset-or-a-space) or [ModelScope](https://modelscope.cn/docs/datasets/download).
-
-
-
### Step 3: configurations
-
Trinity-RFT provides a web interface for configuring your RFT process.
> [!NOTE]
> This is an experimental feature, and we will continue to improve it.
-
To launch the web interface for minimal configurations, you can run
```bash
trinity studio --port 8080
```
-Then you can configure your RFT process in the web page and generate a config file. You can save the config file for later use or run it directly as described in the following section.
+Then you can configure your RFT process in the web page and generate a config file.
-Advanced users can also edit the config file directly.
+Advanced users can also edit the config file directly.
We provide example config files in [`examples`](examples/).
For complete GUI features, please refer to the monorepo for [Trinity-Studio](https://github.com/modelscope/Trinity-Studio).
-
-
-
- Example: config manager GUI
-
-
-
-
-
-
-
-
-
### Step 4: run the RFT process
-
Start a ray cluster:
```shell
# On master node
ray start --head
-
# On worker nodes
ray start --address=
```
-(Optional) You may use [Wandb](https://docs.wandb.ai/quickstart/) / [TensorBoard](https://www.tensorflow.org/tensorboard) / [MLFlow](https://mlflow.org) for better monitoring.
-Please refer to [this documentation](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html#monitor-configuration) for the corresponding configurations.
-For example, to log in to Wandb:
-
-```shell
-export WANDB_API_KEY=
-wandb login
-```
+(Optional) You may use [Wandb](https://docs.wandb.ai/quickstart/) / [TensorBoard](https://www.tensorflow.org/tensorboard) / [MLFlow](https://mlflow.org) for better monitoring. Please refer to [this documentation](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html#monitor-configuration) for the corresponding configurations.
For command-line users, run the RFT process:
@@ -364,35 +306,38 @@ For command-line users, run the RFT process:
trinity run --config
```
-For example, below is the command for fine-tuning Qwen2.5-1.5B-Instruct on GSM8k with GRPO:
+Example — fine-tuning Qwen2.5-1.5B-Instruct on GSM8k with GRPO:
-```shell
+```bash
trinity run --config examples/grpo_gsm8k/gsm8k.yaml
```
-For studio users, click "Run" in the web interface.
-
-
+---
## Contribution Guide
This project is currently under active development, and we welcome contributions from the community!
-See [CONTRIBUTING.md](./CONTRIBUTING.md) for detailed contribution guidelines.
+We welcome contributions of all kinds, including:
+
+* Documentation improvements
+* Example workflows
+* Bug fixes and performance optimizations
+If you're new to the project, documentation and example updates are a great place to start.
+
+See [CONTRIBUTING.md](./CONTRIBUTING.md) for detailed contribution guidelines.
## Acknowledgements
This project is built upon many excellent open-source projects, including:
-+ [verl](https://github.com/volcengine/verl), [FSDP](https://pytorch.org/docs/stable/fsdp.html) and [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) for LLM training;
-+ [vLLM](https://github.com/vllm-project/vllm) for LLM inference;
-+ [Data-Juicer](https://github.com/modelscope/data-juicer?tab=readme-ov-file) for data processing pipelines;
-+ [AgentScope](https://github.com/agentscope-ai/agentscope) for agentic workflow;
-+ [Ray](https://github.com/ray-project/ray) for distributed systems;
-+ we have also drawn inspirations from RL frameworks like [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [TRL](https://github.com/huggingface/trl), [ChatLearn](https://github.com/alibaba/ChatLearn) and [rLLM](https://github.com/rllm-org/rllm);
-+ ......
-
+* [verl](https://github.com/volcengine/verl), [FSDP](https://pytorch.org/docs/stable/fsdp.html) and [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) for LLM training
+* [vLLM](https://github.com/vllm-project/vllm) for LLM inference
+* [Data-Juicer](https://github.com/modelscope/data-juicer?tab=readme-ov-file) for data processing pipelines
+* [AgentScope](https://github.com/agentscope-ai/agentscope) for agentic workflow
+* [Ray](https://github.com/ray-project/ray) for distributed systems
+* RL frameworks like [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [TRL](https://github.com/huggingface/trl), [ChatLearn](https://github.com/alibaba/ChatLearn) and [rLLM](https://github.com/rllm-org/rllm)
## Citation