Skip to content

Commit

Permalink
Added doc about beta in the Trainer's docstring
Browse files Browse the repository at this point in the history
  • Loading branch information
kashif committed Nov 23, 2023
1 parent 8410b2d commit de476f0
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion trl/trainer/dpo_trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ class DPOTrainer(Trainer):
Hugging Face transformer model with a casual language modelling head. Used for implicit reward computation and loss. If no
reference model is provided, the trainer will create a reference model with the same architecture as the model to be optimized.
beta (`float`, defaults to 0.1):
The beta factor in DPO loss. Higher beta means less divergence from the initial policy.
The beta factor in DPO loss. Higher beta means less divergence from the initial policy. For the IPO loss, beta is the regularization parameter denoted by tau in the paper.
loss_type (`str`, defaults to `"sigmoid"`):
The type of DPO loss to use. Either `"sigmoid"` the default DPO loss,`"hinge"` loss from SLiC paper or `"ipo"` from IPO paper.
args (`transformers.TrainingArguments`):
Expand Down

0 comments on commit de476f0

Please sign in to comment.