From f2484c32221d88661bedbff98c860b48e0b0f9bb Mon Sep 17 00:00:00 2001 From: Xuechen Li <12689993+lxuechen@users.noreply.github.com> Date: Sat, 2 Dec 2023 14:40:33 -0800 Subject: [PATCH] chore: finalize dpo --- README.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/README.md b/README.md index a711e6a0..8b23b90a 100644 --- a/README.md +++ b/README.md @@ -294,6 +294,17 @@ bash examples/scripts/rlhf_quark.sh \ ``` +### [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290) + +To replicate our DPO results for the AlpacaFarm evaluation suite, run + +```bash +bash examples/scripts/rlhf_quark.sh \ + \ + \ + +``` + ### OpenAI models To run the OpenAI reference models with our prompts and decoding hyperparameters, run