Skip to content

Commit 6f3a452

Browse files
authored
Reorder documentation TOC to surface key trainer sections (#4565)
1 parent 46af266 commit 6f3a452

File tree

3 files changed

+33
-31
lines changed

3 files changed

+33
-31
lines changed

docs/source/_toctree.yml

Lines changed: 17 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,22 @@
1212
- local: paper_index
1313
title: Paper Index
1414
title: Conceptual Guides
15+
- sections: # Sorted alphabetically
16+
- local: dpo_trainer
17+
title: DPO
18+
- local: online_dpo_trainer
19+
title: Online DPO
20+
- local: grpo_trainer
21+
title: GRPO
22+
- local: kto_trainer
23+
title: KTO
24+
- local: reward_trainer
25+
title: Reward
26+
- local: rloo_trainer
27+
title: RLOO
28+
- local: sft_trainer
29+
title: SFT
30+
title: Trainers
1531
- sections:
1632
- local: clis
1733
title: Command Line Interface (CLI)
@@ -55,20 +71,6 @@
5571
title: LoRA Without Regret
5672
title: Examples
5773
- sections:
58-
- sections: # Sorted alphabetically
59-
- local: dpo_trainer
60-
title: DPO
61-
- local: grpo_trainer
62-
title: GRPO
63-
- local: kto_trainer
64-
title: KTO
65-
- local: reward_trainer
66-
title: Reward
67-
- local: rloo_trainer
68-
title: RLOO
69-
- local: sft_trainer
70-
title: SFT
71-
title: Trainers
7274
- local: models
7375
title: Model Classes
7476
- local: model_utils
@@ -105,7 +107,7 @@
105107
title: GSPO-token
106108
- local: judges
107109
title: Judges
108-
- local: minillm
110+
- local: minillm_trainer
109111
title: MiniLLM
110112
- local: nash_md_trainer
111113
title: Nash-MD

docs/source/index.md

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -22,34 +22,34 @@ Below is the current list of TRL trainers, organized by method type (⚡️ = vL
2222

2323
### Online methods
2424

25-
- [`GRPOTrainer`] ⚡️
26-
- [`RLOOTrainer`] ⚡️
27-
- [`experimental.nash_md.NashMDTrainer`] 🧪 ⚡️
28-
- [`experimental.online_dpo.OnlineDPOTrainer`] 🧪 ⚡️
29-
- [`experimental.ppo.PPOTrainer`] 🧪
30-
- [`experimental.xpo.XPOTrainer`] 🧪 ⚡️
25+
- [`GRPOTrainer`](grpo_trainer) ⚡️
26+
- [`RLOOTrainer`](rloo_trainer) ⚡️
27+
- [`OnlineDPOTrainer`](online_dpo_trainer) 🧪 ⚡️
28+
- [`NashMDTrainer`](nash_md_trainer) 🧪 ⚡️
29+
- [`PPOTrainer`](ppo_trainer) 🧪
30+
- [`XPOTrainer`](xpo_trainer) 🧪 ⚡️
3131

3232
### Reward modeling
3333

34-
- [`RewardTrainer`]
35-
- [`experimental.prm.PRMTrainer`] 🧪
34+
- [`RewardTrainer`](reward_trainer)
35+
- [`PRMTrainer`](prm_trainer) 🧪
3636

3737
</div>
3838
<div style="flex: 1; min-width: 0;">
3939

4040
### Offline methods
4141

42-
- [`SFTTrainer`]
43-
- [`DPOTrainer`]
44-
- [`KTOTrainer`]
45-
- [`experimental.bco.BCOTrainer`] 🧪
46-
- [`experimental.cpo.CPOTrainer`] 🧪
47-
- [`experimental.orpo.ORPOTrainer`] 🧪
42+
- [`SFTTrainer`](sft_trainer)
43+
- [`DPOTrainer`](dpo_trainer)
44+
- [`KTOTrainer`](kto_trainer)
45+
- [`BCOTrainer`](bco_trainer) 🧪
46+
- [`CPOTrainer`](cpo_trainer) 🧪
47+
- [`ORPOTrainer`](orpo_trainer) 🧪
4848

4949
### Knowledge distillation
5050

51-
- [`experimental.gkd.GKDTrainer`] 🧪
52-
- [`experimental.minillm.MiniLLMTrainer`] 🧪
51+
- [`GKDTrainer`](gkd_trainer) 🧪
52+
- [`MiniLLMTrainer`](minillm_trainer) 🧪
5353

5454
</div>
5555
</div>
File renamed without changes.

0 commit comments

Comments
 (0)