Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No updating on the ref_model? #40

Closed
hdvvip opened this issue Jul 18, 2022 · 1 comment
Closed

No updating on the ref_model? #40

hdvvip opened this issue Jul 18, 2022 · 1 comment

Comments

@hdvvip
Copy link

hdvvip commented Jul 18, 2022

Hi, thanks for the great repo,

I wonder that in the original PPO paper, the authors update the ref_model to become the fine-tuned model after every iteration (theta_old = theta). I attached the image below for your convenience.
Selection_1430

So, there must be a code line where ref_moel = model?
Why didn't you update your ref_model as shown in the original PPO paper?
Thank you.

@hdvvip
Copy link
Author

hdvvip commented Jul 18, 2022

Ok I understood, you used logprob of the current network as theta_old

train_stats = self.train_minibatch(logprobs[idx].unsqueeze(0), values[idx].unsqueeze(0),
rewards[idx].unsqueeze(0), queries[idx].unsqueeze(0),
responses[idx].unsqueeze(0),
torch.cat([queries[idx],responses[idx]]).unsqueeze(0))

This works similarly to update theta_old after every iteration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant