Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about toy example on README #269

Closed
rmill040 opened this issue Apr 1, 2023 · 3 comments · Fixed by #271
Closed

Question about toy example on README #269

rmill040 opened this issue Apr 1, 2023 · 3 comments · Fixed by #271

Comments

@rmill040
Copy link
Contributor

rmill040 commented Apr 1, 2023

In the toy example on the README, the rollout step generates the response_tensor using the model_ref as shown in the last line below:

...

# get models
model = AutoModelForCausalLMWithValueHead.from_pretrained('gpt2')
model_ref = create_reference_model(model)

tokenizer = AutoTokenizer.from_pretrained('gpt2')

# initialize trainer
ppo_config = PPOConfig(
    batch_size=1,
)

# encode a query
query_txt = "This morning I went to the "
query_tensor = tokenizer.encode(query_txt, return_tensors="pt")

# get model response - RESPONSE FROM ROLLOUT STEP
response_tensor = respond_to_batch(model_ref, query_tensor)

However, in the graphic earlier in the README, it shows the active model (red rectangle) as the model that should be generating the response. Is this a typo? In other words, I would have expected the line to read:

# get model response - RESPONSE FROM ROLLOUT STEP
response_tensor = respond_to_batch(model, query_tensor)

I could see the optimization still working either way, using model or model_ref depending on how the rewards model is scoring the query + response pairs.

If this isn't a typo, then maybe some clarifying documentation, even a one-liner, can help explain the discrepancy. Or maybe I'm just the only confused one 😊

@younesbelkada
Copy link
Contributor

Hi @rmill040
Thanks for raising up the issue! I think that you are right here and it should be the active model that should be used instead of model_ref. Would you mind contributing by opening a Pull Request to fix that issue? 🤗 Otherwise happy to do it!
cc @lvwerra

@rmill040
Copy link
Contributor Author

rmill040 commented Apr 3, 2023

Hi @younesbelkada, happy to help!

Here's the PR: #271

@younesbelkada
Copy link
Contributor

Thanks a lot! Just reviewed :D

younesbelkada pushed a commit that referenced this issue Apr 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants