EOS token processing for multi-turn DPO #741

natolambert · 2023-09-05T21:54:12Z

Instead of asserts, mask out EOS tokens in the attention mask to.
my dev setup for TRL is out of day. Will fix the precommit stuff.

CC @kashif

HuggingFaceDocBuilderDev · 2023-09-05T22:00:49Z

The documentation is not available anymore as the PR was closed or merged.

kashif · 2023-09-07T17:15:48Z

@natolambert the recent seq-2-seq PR might have caused some merge conflicts

natolambert · 2023-09-07T21:19:10Z

kk @kashif and @lvwerra merge conflict should be fixed now. Will double check via my testing with H4 this afternoon!

younesbelkada

Thanks a lot for adding the eos token support for DPO!
My tiny suggestion would be to replace the indices logic by something like

attention_mask = ~torch.Tensor(prompt_tokens["input_ids"]).ne(eos_token_id)
prompt_tokens["attention_mask"] = attention_mask.tolist()

This looks already great though ! Feel free to merge as it is in case you prefer your approach (mine requires to first convert it to torch tensor and convert back the attention mask to a list)

natolambert · 2023-09-12T16:49:47Z

merging this as a starting point, expect more DPO improvements and PRs soon!
I forget if I'm supposed to do that with TRL 😅 let me know.

* init * fix * add doc * style * clarify example

robertgshaw2-neuralmagic · 2023-11-12T00:55:31Z

@natolambert

I am just curious, what is the reason for setting the attention mask to 0 for all inputs with eos_token_id?

My thinking is that once these models have been aligned with DPO, they will typically be used with the chat templates (as in the Zephyr model card https://huggingface.co/HuggingFaceH4/zephyr-7b-beta#intended-uses--limitations)

In that model card, the eos_token is present in the prompt for single and multi-turn generation

import torch
from transformers import pipeline

pipe = pipeline("text-generation", model="HuggingFaceH4/zephyr-7b-beta", torch_dtype=torch.bfloat16, device_map="auto")

# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
messages = [
    {
        "role": "system",
        "content": "You are a friendly chatbot who always responds in the style of a pirate",
    },
    {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
# <|system|>
# You are a friendly chatbot who always responds in the style of a pirate.</s>
# <|user|>
# How many helicopters can a human eat in one sitting?</s>
# <|assistant|>
# Ah, me hearty matey! But yer question be a puzzler! A human cannot eat a helicopter in one sitting, as helicopters are not edible. They be made of metal, plastic, and other materials, not food!

So during inference, the eos_token_ids will have attention_mask=1. Shouldn't we mirror this during training?

natolambert · 2023-11-12T17:22:49Z

Hey @rsnm2 - it was really a hack to make things work at all. It would be nice to revisit it. There was some technical issue during training that made that not work.

robertgshaw2-neuralmagic · 2023-11-25T21:39:05Z

Hey @natolambert - makes sense

I think the solution would just be to remove all of this logic (instead leaving the attention_mask=1 for the eos_token_ids, which is the default in tokenizers) unless there is a good reason to ignore the eos_tokens during training (I dont think there is)

natolambert · 2023-11-27T16:36:41Z

You should try reverting it and playing with it. IIRC the Transformers Trainer errors out with no processing / truncation.
@rsnm2

* init * fix * add doc * style * clarify example

Nathan Lambert added 2 commits September 5, 2023 14:51

init

de509a7

fix

5e043e9

Nathan Lambert added 3 commits September 5, 2023 17:46

add doc

346441a

style

78396ab

clarify example

3904f39

natolambert requested a review from lvwerra September 7, 2023 14:50

Merge branch 'main' into dpo_token_fix

ab87260

younesbelkada approved these changes Sep 11, 2023

View reviewed changes

natolambert merged commit 9141aa4 into main Sep 12, 2023

natolambert deleted the dpo_token_fix branch September 12, 2023 16:49

kushal-tri pushed a commit to kushalarora/trl that referenced this pull request Sep 19, 2023

EOS token processing for multi-turn DPO (huggingface#741)

baa9b66

* init * fix * add doc * style * clarify example

kashif mentioned this pull request Sep 22, 2023

DPODataCollatorWithPadding why is the eos_token_id not accepted ? #806

Closed

lapp0 pushed a commit to lapp0/trl that referenced this pull request May 10, 2024

EOS token processing for multi-turn DPO (huggingface#741)

8903cf2

* init * fix * add doc * style * clarify example

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EOS token processing for multi-turn DPO #741

EOS token processing for multi-turn DPO #741

natolambert commented Sep 5, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Sep 5, 2023 •

edited

Loading

kashif commented Sep 7, 2023

natolambert commented Sep 7, 2023

younesbelkada left a comment

natolambert commented Sep 12, 2023 •

edited

Loading

robertgshaw2-neuralmagic commented Nov 12, 2023 •

edited

Loading

natolambert commented Nov 12, 2023 •

edited

Loading

robertgshaw2-neuralmagic commented Nov 25, 2023

natolambert commented Nov 27, 2023

EOS token processing for multi-turn DPO #741

EOS token processing for multi-turn DPO #741

Conversation

natolambert commented Sep 5, 2023 • edited Loading

HuggingFaceDocBuilderDev commented Sep 5, 2023 • edited Loading

kashif commented Sep 7, 2023

natolambert commented Sep 7, 2023

younesbelkada left a comment

Choose a reason for hiding this comment

natolambert commented Sep 12, 2023 • edited Loading

robertgshaw2-neuralmagic commented Nov 12, 2023 • edited Loading

natolambert commented Nov 12, 2023 • edited Loading

robertgshaw2-neuralmagic commented Nov 25, 2023

natolambert commented Nov 27, 2023

natolambert commented Sep 5, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Sep 5, 2023 •

edited

Loading

natolambert commented Sep 12, 2023 •

edited

Loading

robertgshaw2-neuralmagic commented Nov 12, 2023 •

edited

Loading

natolambert commented Nov 12, 2023 •

edited

Loading