No updating on the ref_model? #40

hdvvip · 2022-07-18T00:44:16Z

Hi, thanks for the great repo,

I wonder that in the original PPO paper, the authors update the ref_model to become the fine-tuned model after every iteration (theta_old = theta). I attached the image below for your convenience.

So, there must be a code line where ref_moel = model?
Why didn't you update your ref_model as shown in the original PPO paper?
Thank you.

hdvvip · 2022-07-18T04:39:23Z

Ok I understood, you used logprob of the current network as theta_old

train_stats = self.train_minibatch(logprobs[idx].unsqueeze(0), values[idx].unsqueeze(0),
rewards[idx].unsqueeze(0), queries[idx].unsqueeze(0),
responses[idx].unsqueeze(0),
torch.cat([queries[idx],responses[idx]]).unsqueeze(0))

This works similarly to update theta_old after every iteration.

hdvvip closed this as completed Jul 18, 2022

natolambert mentioned this issue Mar 3, 2023

Exploring RM loss potential bug. #192

Closed

August-murr mentioned this issue Jan 6, 2025

onlinedpo error when use deepspeed zero3 August-murr/trl#7

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No updating on the ref_model? #40

No updating on the ref_model? #40

hdvvip commented Jul 18, 2022

hdvvip commented Jul 18, 2022

No updating on the ref_model? #40

No updating on the ref_model? #40

Comments

hdvvip commented Jul 18, 2022

hdvvip commented Jul 18, 2022