You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wonder that in the original PPO paper, the authors update the ref_model to become the fine-tuned model after every iteration (theta_old = theta). I attached the image below for your convenience.
So, there must be a code line where ref_moel = model?
Why didn't you update your ref_model as shown in the original PPO paper?
Thank you.
The text was updated successfully, but these errors were encountered:
Hi, thanks for the great repo,
I wonder that in the original PPO paper, the authors update the ref_model to become the fine-tuned model after every iteration (theta_old = theta). I attached the image below for your convenience.
So, there must be a code line where ref_moel = model?
Why didn't you update your ref_model as shown in the original PPO paper?
Thank you.
The text was updated successfully, but these errors were encountered: