Question about toy example on README #269

rmill040 · 2023-04-01T15:05:32Z

In the toy example on the README, the rollout step generates the response_tensor using the model_ref as shown in the last line below:

...

# get models
model = AutoModelForCausalLMWithValueHead.from_pretrained('gpt2')
model_ref = create_reference_model(model)

tokenizer = AutoTokenizer.from_pretrained('gpt2')

# initialize trainer
ppo_config = PPOConfig(
    batch_size=1,
)

# encode a query
query_txt = "This morning I went to the "
query_tensor = tokenizer.encode(query_txt, return_tensors="pt")

# get model response - RESPONSE FROM ROLLOUT STEP
response_tensor = respond_to_batch(model_ref, query_tensor)

However, in the graphic earlier in the README, it shows the active model (red rectangle) as the model that should be generating the response. Is this a typo? In other words, I would have expected the line to read:

# get model response - RESPONSE FROM ROLLOUT STEP
response_tensor = respond_to_batch(model, query_tensor)

I could see the optimization still working either way, using model or model_ref depending on how the rewards model is scoring the query + response pairs.

If this isn't a typo, then maybe some clarifying documentation, even a one-liner, can help explain the discrepancy. Or maybe I'm just the only confused one 😊

The text was updated successfully, but these errors were encountered:

younesbelkada · 2023-04-03T09:22:14Z

Hi @rmill040
Thanks for raising up the issue! I think that you are right here and it should be the active model that should be used instead of model_ref. Would you mind contributing by opening a Pull Request to fix that issue? 🤗 Otherwise happy to do it!
cc @lvwerra

…ce#269)

rmill040 · 2023-04-03T13:14:19Z

Hi @younesbelkada, happy to help!

Here's the PR: #271

younesbelkada · 2023-04-03T13:21:49Z

Thanks a lot! Just reviewed :D

Co-authored-by: rmilleti <rmilleti@amazon.com>

rmill040 added a commit to rmill040/trl that referenced this issue Apr 3, 2023

Use active model to generate response in example on README (huggingfa…

e4bf2ce

…ce#269)

rmill040 mentioned this issue Apr 3, 2023

Use active model to generate response in example on README (#269) #271

Merged

younesbelkada closed this as completed in #271 Apr 3, 2023

younesbelkada pushed a commit that referenced this issue Apr 3, 2023

Use active model to generate response in example on README (#269) (#271)

a2749d9

Co-authored-by: rmilleti <rmilleti@amazon.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about toy example on README #269

Question about toy example on README #269

rmill040 commented Apr 1, 2023 •

edited

Loading

younesbelkada commented Apr 3, 2023

rmill040 commented Apr 3, 2023

younesbelkada commented Apr 3, 2023

Question about toy example on README #269

Question about toy example on README #269

Comments

rmill040 commented Apr 1, 2023 • edited Loading

younesbelkada commented Apr 3, 2023

rmill040 commented Apr 3, 2023

younesbelkada commented Apr 3, 2023

rmill040 commented Apr 1, 2023 •

edited

Loading