Skip to content

Commit ddb65e8

Browse files
Add experimental imports to docs (#4616)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
1 parent 5fab472 commit ddb65e8

File tree

3 files changed

+7
-1
lines changed

3 files changed

+7
-1
lines changed

docs/source/bco_trainer.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@ For a detailed example have a look at the `examples/scripts/bco.py` script. At a
2222
The `beta` refers to the hyperparameter of the implicit reward, and the dataset contains the 3 entries listed above. Note that the `model` and `ref_model` need to have the same architecture (ie decoder only or encoder-decoder).
2323

2424
```python
25+
from trl.experimental.bco import BCOConfig, BCOTrainer
26+
2527
training_args = BCOConfig(
2628
beta=0.1,
2729
)

trl/experimental/papo/papo_trainer.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ class PAPOTrainer(GRPOTrainer):
4040
4141
```python
4242
from datasets import load_dataset
43-
from trl import PAPOTrainer, PAPOConfig
43+
from trl.experimental.papo import PAPOTrainer, PAPOConfig
4444
4545
dataset = load_dataset("your-vlm-dataset", split="train")
4646

trl/experimental/winrate_callback.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,10 @@ class WinRateCallback(TrainerCallback):
100100
101101
Usage:
102102
```python
103+
from trl import DPOTrainer
104+
from trl.experimental.judges import PairRMJudge
105+
from trl.experimental.winrate_callback import WinRateCallback
106+
103107
trainer = DPOTrainer(...)
104108
judge = PairRMJudge()
105109
win_rate_callback = WinRateCallback(judge=judge, trainer=trainer)

0 commit comments

Comments
 (0)