You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
KTO derives an alignment objective from prospect theory and learns directly from **binary** human feedback (liked/disliked), matching or surpassing DPO-style methods while handling imbalanced/noisy signals well.
491
+
To reproduce the paper's setting, you can use the default configuration of [`KTOTrainer`]:
492
+
493
+
```python
494
+
from trl import KTOConfig, KTOTrainer
495
+
from transformers import AutoModelForCausalLM, AutoTokenizer
496
+
497
+
model = AutoModelForCausalLM.from_pretrained(model_id)
0 commit comments