Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
* [2025-08] 🎵 We introduce [CHORD](https://github.com/modelscope/Trinity-RFT/tree/main/examples/mix_chord), a dynamic integration of SFT and RL for enhanced LLM fine-tuning ([paper](https://arxiv.org/pdf/2508.11408)).
* [2025-08] ✨ Trinity-RFT v0.2.1 is released! Enhanced features include:
* Agentic RL: support training with general multi-step agentic workflows; check out the [ALFWorld](./docs/sphinx_doc/source/tutorial/example_step_wise.md) and [ReAct](./docs/sphinx_doc/source/tutorial/example_react.md) examples.
* Rollout-Training scheduling: introduce Scheduler, [Synchronizer](./docs/sphinx_doc/source/tutorial/synchronizer.html) and priority queue buffer, which facilitates more efficient and dynamic scheduling of the RFT process.
* Rollout-Training scheduling: introduce Scheduler, [Synchronizer](./docs/sphinx_doc/source/tutorial/synchronizer.md) and priority queue buffer, which facilitates more efficient and dynamic scheduling of the RFT process.
* [A benchmark tool](./benchmark) for quick verification and experimentation.
* RL algorithms: implement [GSPO](https://github.com/modelscope/Trinity-RFT/pull/154), [AsymRE](https://github.com/modelscope/Trinity-RFT/pull/187), [TOPR, CISPO](https://github.com/modelscope/Trinity-RFT/pull/185), [RAFT](https://github.com/modelscope/Trinity-RFT/pull/174).
* [2025-07] Trinity-RFT v0.2.0 is released.
Expand Down Expand Up @@ -362,8 +362,8 @@ Guidelines for developers and researchers:

+ [Build new RL scenarios](./docs/sphinx_doc/source/tutorial/trinity_programming_guide.md#workflows-for-rl-environment-developers)
+ [Implement new RL algorithms](./docs/sphinx_doc/source/tutorial/trinity_programming_guide.md#algorithms-for-rl-algorithm-developers)
+ [Develop new data operators](./docs/sphinx_doc/source/tutorial/trinity_programming_guide.html#operators-for-data-developers)
+ [Understand the coordination between explorer and trainer](./docs/sphinx_doc/source/tutorial/synchronizer.html)
+ [Develop new data operators](./docs/sphinx_doc/source/tutorial/trinity_programming_guide.md#operators-for-data-developers)
+ [Understand the coordination between explorer and trainer](./docs/sphinx_doc/source/tutorial/synchronizer.md)



Expand Down
6 changes: 3 additions & 3 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
* [2025-08] 🎵 我们推出了 [CHORD](https://github.com/modelscope/Trinity-RFT/tree/main/examples/mix_chord),一种动态整合 SFT 和 RL 来微调 LLM 的方法([论文](https://arxiv.org/pdf/2508.11408))。
* [2025-08] ✨ 发布 Trinity-RFT v0.2.1 版本!新增功能包括:
* 智能体 RL:支持通用多轮工作流的训练;请参考 [ALFWorld](./docs/sphinx_doc/source/tutorial/example_step_wise.md) 和 [ReAct](./docs/sphinx_doc/source/tutorial/example_react.md) 例子。
* Rollout-Training 调度: 通过引入 Scheduler, [Synchronizer](./docs/sphinx_doc/source/tutorial/synchronizer.html) 以及优先队列类型 Buffer, 支持 RFT 流程中更高效与灵活的调度。
* Rollout-Training 调度: 通过引入 Scheduler, [Synchronizer](./docs/sphinx_doc/source/tutorial/synchronizer.md) 以及优先队列类型 Buffer, 支持 RFT 流程中更高效与灵活的调度。
* [Benchmark 工具](./benchmark),用于快速验证与实验。
* RL 算法:实现 [GSPO](https://github.com/modelscope/Trinity-RFT/pull/154), [AsymRE](https://github.com/modelscope/Trinity-RFT/pull/187), [TOPR, CISPO](https://github.com/modelscope/Trinity-RFT/pull/185), [RAFT](https://github.com/modelscope/Trinity-RFT/pull/174) 等算法。
* [2025-07] 发布 Trinity-RFT v0.2.0 版本,新增了多项功能优化。
Expand Down Expand Up @@ -361,8 +361,8 @@ trinity run --config examples/grpo_gsm8k/gsm8k.yaml

+ [构建新的 RL 场景](./docs/sphinx_doc/source/tutorial/trinity_programming_guide.md#workflows-for-rl-environment-developers)
+ [实现新的 RL 算法](./docs/sphinx_doc/source/tutorial/trinity_programming_guide.md#algorithms-for-rl-algorithm-developers)
+ [开发新的数据处理操作](./docs/sphinx_doc/source/tutorial/trinity_programming_guide.html#operators-for-data-developers)
+ [理解 explorer-trainer 调度逻辑](./docs/sphinx_doc/source/tutorial/synchronizer.html)
+ [开发新的数据处理操作](./docs/sphinx_doc/source/tutorial/trinity_programming_guide.md#operators-for-data-developers)
+ [理解 explorer-trainer 调度逻辑](./docs/sphinx_doc/source/tutorial/synchronizer.md)



Expand Down
10 changes: 5 additions & 5 deletions docs/sphinx_doc/source/main.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@

* [2025-08] 🎵 We introduce [CHORD](https://github.com/modelscope/Trinity-RFT/tree/main/examples/mix_chord), a dynamic integration of SFT and RL for enhanced LLM fine-tuning ([paper](https://arxiv.org/pdf/2508.11408)).
* [2025-08] ✨ Trinity-RFT v0.2.1 is released! Enhanced features include:
* Agentic RL: support training with general multi-step agentic workflows; check out the [ALFWorld](./docs/sphinx_doc/source/tutorial/example_step_wise.md) and [ReAct](./docs/sphinx_doc/source/tutorial/example_react.md) examples.
* Rollout-Training scheduling: introduce Scheduler, [Synchronizer](./docs/sphinx_doc/source/tutorial/synchronizer.html) and priority queue buffer, which facilitates more efficient and dynamic scheduling of the RFT process.
* [A benchmark tool](./benchmark) for quick verification and experimentation.
* Agentic RL: support training with general multi-step agentic workflows; check out the [ALFWorld](/tutorial/example_step_wise.md) and [ReAct](/tutorial/example_react.md) examples.
* Rollout-Training scheduling: introduce Scheduler, [Synchronizer](/tutorial/synchronizer.md) and priority queue buffer, which facilitates more efficient and dynamic scheduling of the RFT process.
* [A benchmark tool](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark) for quick verification and experimentation.
* RL algorithms: implement [GSPO](https://github.com/modelscope/Trinity-RFT/pull/154), [AsymRE](https://github.com/modelscope/Trinity-RFT/pull/187), [TOPR, CISPO](https://github.com/modelscope/Trinity-RFT/pull/185), [RAFT](https://github.com/modelscope/Trinity-RFT/pull/174).
* [2025-07] Trinity-RFT v0.2.0 is released.
* [2025-07] We update the [technical report](https://arxiv.org/abs/2505.17826) (arXiv v2) with new features, examples, and experiments.
Expand Down Expand Up @@ -341,8 +341,8 @@ Guidelines for developers and researchers:

+ {ref}`Build new RL scenarios <Workflows>`
+ {ref}`Implement new RL algorithms <Algorithms>`
+ [Develop new data operators](/tutorial/trinity_programming_guide.html#operators-for-data-developers)
+ [Understand the coordination between explorer and trainer](/tutorial/synchronizer.html)
+ [Develop new data operators](/tutorial/trinity_programming_guide.md#operators-for-data-developers)
+ [Understand the coordination between explorer and trainer](/tutorial/synchronizer.md)


For some frequently asked questions, see [FAQ](/tutorial/faq.md).
Expand Down