Commit b6815e3
committed
Squashed commit of the following:
commit 52ed4df
Author: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Date: Thu Nov 20 21:41:23 2025 +0000
Fix style OpenEnv example
commit a263946
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date: Thu Nov 20 14:44:15 2025 +0100
Update OpenEnv guide with latest details (#4552)
Co-authored-by: burtenshaw <ben.burtenshaw@gmail.com>
commit 1a9ff52
Author: Kashif Rasul <kashif.rasul@gmail.com>
Date: Wed Nov 19 15:34:25 2025 +0100
[OpenEnv] browsergym example script (#4539)
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
commit 6cbcd94
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date: Wed Nov 19 14:39:44 2025 +0100
Update OpenEnv example scripts (#4547)
commit 8510589
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date: Wed Nov 19 14:39:20 2025 +0100
Add OpenEnv Script examples to docs (#4533)
commit e622196
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Mon Nov 17 03:12:30 2025 -0700
[Doc] Drop dummy reward and dataset for DeepMath-103K and accuracy reward (#4524)
commit 1b1242c
Author: Kashif Rasul <kashif.rasul@gmail.com>
Date: Fri Nov 14 20:51:41 2025 +0100
[OpenEnv] add vllm colocate mode to openenv scripts (#4510)
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
commit f39d18a
Author: Fabio Milentiansen Sim <sim.fabio.fms@gmail.com>
Date: Fri Nov 14 23:39:02 2025 +0700
fix(GOLDTrainer): Resolve incorrect attribute access and VLLMClient.generate() output type (#4526)
commit d45eaab
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date: Fri Nov 14 12:12:09 2025 +0100
Add vLLM quantization option for colocate (#4496)
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
commit a91d4b3
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date: Fri Nov 14 02:19:08 2025 +0100
Prevent upcasting norm layers in `prepare_model_for_kbit_training` (#4457)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
commit 121318e
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date: Thu Nov 13 17:13:16 2025 -0800
docs: Extend CLI basic usage examples to all supported CLIs (#4425)
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
commit 7918320
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Thu Nov 13 13:20:52 2025 -0700
Remove test trainer args (#4517)
commit 102dc41
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Thu Nov 13 12:36:43 2025 -0700
Rename `flash-attn` to `flash-attn2` (#4514)
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
commit 5de62b0
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Thu Nov 13 12:05:48 2025 -0700
Add step time metric to GRPO Trainer for performance tracking (#4516)
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
commit f1e6377
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date: Thu Nov 13 11:01:19 2025 -0800
Move PPOTrainer to trl.experimental.ppo (#4482)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
commit 01f497e
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date: Thu Nov 13 10:14:58 2025 -0800
Move NashMDTrainer to experimental module (#4477)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
commit b6c838a
Author: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Date: Thu Nov 13 16:53:26 2025 +0000
`aws-general-8-plus` runner for Docker build
commit ed5c7bb
Author: YangKai0616 <kai.yang@intel.com>
Date: Fri Nov 14 00:42:48 2025 +0800
[Bug Fix] OnlineDPOTrainer with vLLM Server Mode (#4500)
commit ded9bc6
Author: lewtun <lewis.c.tunstall@gmail.com>
Date: Thu Nov 13 17:33:59 2025 +0100
Fix Docker images for Liger (#4522)
commit fd04760
Author: Pramodith Ballapuram <16939722+pramodith@users.noreply.github.com>
Date: Thu Nov 13 11:31:10 2025 +0000
Paper Index: Change `num_completions` to `num_generations` (#4515)
commit b7918c0
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date: Wed Nov 12 20:35:44 2025 -0800
Move GKDTrainer to experimental module (#4474)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
commit 07b5011
Author: Tamoghno Kandar <55907205+tamoghnokandar@users.noreply.github.com>
Date: Wed Nov 12 20:07:33 2025 -0800
Replace flash attention2 with kernels-community/flash-attn2 (#4426)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
commit 7a57fd4
Author: Yuxian Gu <guyx21@mails.tsinghua.edu.cn>
Date: Thu Nov 13 11:16:20 2025 +0800
MiniLLM: Fix arguments in config & add to documentation index (#4518)
commit a145eaf
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date: Wed Nov 12 16:35:46 2025 -0800
refactor: Move CPOTrainer to experimental module (#4470)
commit d2dc717
Author: Taha Yassine <40228615+taha-yassine@users.noreply.github.com>
Date: Thu Nov 13 00:56:47 2025 +0100
Replace `wandb_log_unique_prompts` with `log_unique_prompts` (#4508)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
commit 799b39b
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Wed Nov 12 16:21:05 2025 -0700
`device_map` and `dtype` to `"auto"` by default (#4509)
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
commit a6a2beb
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Wed Nov 12 09:42:31 2025 -0700
Add temporary workaround for `lr_scheduler_kwargs` dtype issue in Transformers 4.57.0 (#4513)
commit 346701a
Author: lewtun <lewis.c.tunstall@gmail.com>
Date: Wed Nov 12 17:42:18 2025 +0100
Replace accelerate logging with stdlib in CLI (#4512)
commit 4db63af
Author: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Date: Wed Nov 12 02:19:51 2025 +0000
Fix GRPO unsqueeze advantages
commit ecb2811
Author: Yuxian Gu <guyx21@mails.tsinghua.edu.cn>
Date: Wed Nov 12 10:17:22 2025 +0800
Add MiniLLM Trainer (#4504)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
commit 89e4688
Author: Taha Yassine <40228615+taha-yassine@users.noreply.github.com>
Date: Tue Nov 11 20:36:23 2025 +0100
Add support for images inside tables with Trackio completions logging (#4505)
commit 2d3279c
Author: lewtun <lewis.c.tunstall@gmail.com>
Date: Tue Nov 11 19:22:25 2025 +0100
Tweak description for vLLM sleep mode (#4506)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
commit 02a3477
Author: Luke Hinds <lukehinds@gmail.com>
Date: Mon Nov 10 16:41:51 2025 +0000
Fix link to OpenEnv docs (#4502)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
commit aaed6c1
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Sat Nov 8 08:20:48 2025 -0700
Consistency regarding relative imports (#4498)
commit 20760ba
Author: burtenshaw <ben.burtenshaw@gmail.com>
Date: Fri Nov 7 10:50:50 2025 +0100
[DOCS] update and fix openenv (#4490)
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
commit 64cfca4
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date: Thu Nov 6 22:47:04 2025 -0800
Move judges to experimental submodule (#4439)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
commit 97ca1a2
Author: Pramodith Ballapuram <16939722+pramodith@users.noreply.github.com>
Date: Fri Nov 7 00:20:15 2025 +0000
Fix bugs in CISPO conditions (#4499)
commit ffb3dd5
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date: Thu Nov 6 16:03:00 2025 -0800
docs: Add PEFT subsection to reducing memory usage guide (#4430)
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
commit 43b6541
Author: SolarWindRider <31797478+SolarWindRider@users.noreply.github.com>
Date: Fri Nov 7 06:55:34 2025 +0800
Support completion bootstrap for VLM in GRPO/RLOO (#4452)
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
commit 642b721
Author: Pramodith Ballapuram <16939722+pramodith@users.noreply.github.com>
Date: Thu Nov 6 22:33:00 2025 +0000
ScaleRL: Add CISPO Loss (#4495)
commit 32e9c9f
Author: Ishita Bhattacharyya <139248026+ishitab02@users.noreply.github.com>
Date: Fri Nov 7 03:37:43 2025 +0530
⛴️ Add kernels to Docker images (#4445)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
commit 1bcfc50
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date: Thu Nov 6 13:40:12 2025 -0800
Move XPOTrainer to trl.experimental.xpo (#4485)
Co-authored-by: Invidia19 <54266187+Invidia19@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
commit 37942bc
Author: Pramodith Ballapuram <16939722+pramodith@users.noreply.github.com>
Date: Thu Nov 6 21:32:03 2025 +0000
Buffer samples based on group level stds. (#4492)
commit 66cd02a
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Thu Nov 6 20:58:25 2025 +0100
Add tiny model Qwen3VLForConditionalGeneration to CI (#4494)
commit 32febb4
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date: Thu Nov 6 18:21:56 2025 +0100
Add LFM2 to SFT notebook examples (#4455)1 parent c2db596 commit b6815e3
File tree
126 files changed
+8028
-5771
lines changed- .github/workflows
- docker
- trl-dev
- trl
- docs/source
- examples
- datasets
- notebooks
- scripts
- evals
- openenv
- ppo
- scripts
- tests
- experimental
- trl
- experimental
- bco
- cpo
- gfpo
- gkd
- gold
- grpo_with_replay_buffer
- gspo_token
- judges
- minillm
- nash_md
- openenv
- ppo
- xpo
- extras
- models
- rewards
- scripts
- trainer
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
126 files changed
+8028
-5771
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
16 | | - | |
| 16 | + | |
| 17 | + | |
17 | 18 | | |
18 | 19 | | |
19 | 20 | | |
| |||
52 | 53 | | |
53 | 54 | | |
54 | 55 | | |
55 | | - | |
| 56 | + | |
| 57 | + | |
56 | 58 | | |
57 | 59 | | |
58 | 60 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
24 | | - | |
| 24 | + | |
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
28 | | - | |
| 28 | + | |
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
| |||
92 | 92 | | |
93 | 93 | | |
94 | 94 | | |
| 95 | + | |
95 | 96 | | |
96 | | - | |
97 | | - | |
98 | | - | |
99 | | - | |
100 | | - | |
| 97 | + | |
101 | 98 | | |
102 | 99 | | |
103 | 100 | | |
104 | | - | |
| 101 | + | |
105 | 102 | | |
106 | 103 | | |
107 | 104 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | | - | |
6 | | - | |
| 5 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
| 2 | + | |
2 | 3 | | |
3 | | - | |
4 | | - | |
| 4 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
56 | 56 | | |
57 | 57 | | |
58 | 58 | | |
59 | | - | |
60 | | - | |
61 | 59 | | |
62 | 60 | | |
63 | 61 | | |
64 | 62 | | |
65 | | - | |
66 | | - | |
67 | 63 | | |
68 | 64 | | |
69 | 65 | | |
70 | 66 | | |
71 | | - | |
72 | | - | |
73 | | - | |
74 | | - | |
75 | 67 | | |
76 | 68 | | |
77 | 69 | | |
| |||
80 | 72 | | |
81 | 73 | | |
82 | 74 | | |
83 | | - | |
84 | | - | |
85 | 75 | | |
86 | 76 | | |
87 | 77 | | |
88 | 78 | | |
89 | 79 | | |
90 | | - | |
91 | | - | |
92 | 80 | | |
93 | 81 | | |
94 | 82 | | |
| |||
107 | 95 | | |
108 | 96 | | |
109 | 97 | | |
| 98 | + | |
| 99 | + | |
110 | 100 | | |
111 | 101 | | |
| 102 | + | |
| 103 | + | |
112 | 104 | | |
113 | 105 | | |
114 | 106 | | |
115 | 107 | | |
116 | 108 | | |
117 | 109 | | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
118 | 124 | | |
119 | 125 | | |
120 | 126 | | |
| |||
0 commit comments