generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about toy example on README #269
Comments
rmill040
added a commit
to rmill040/trl
that referenced
this issue
Apr 3, 2023
Hi @younesbelkada, happy to help! Here's the PR: #271 |
Thanks a lot! Just reviewed :D |
younesbelkada
pushed a commit
that referenced
this issue
Apr 3, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
In the toy example on the README, the rollout step generates the
response_tensor
using themodel_ref
as shown in the last line below:However, in the graphic earlier in the README, it shows the active model (red rectangle) as the model that should be generating the response. Is this a typo? In other words, I would have expected the line to read:
I could see the optimization still working either way, using
model
ormodel_ref
depending on how the rewards model is scoring the query + response pairs.If this isn't a typo, then maybe some clarifying documentation, even a one-liner, can help explain the discrepancy. Or maybe I'm just the only confused one 😊
The text was updated successfully, but these errors were encountered: