-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about apply_chat_template in examples #1752
Comments
@EganGu Hi! There is a same question in #1541 . I think the example is wrong , the chosen and rejected reponse should only include the final turn response, and the prompt also should be applied the chat template |
https://github.com/huggingface/alignment-handbook/blob/606d2e954fd17999af40e6fb4f712055ca11b2f0/src/alignment/data.py#L42-L108 |
@EganGu 没问题的,无论是aligmenthandbook,firefly还是llama-factory都是按刚才说的那样处理的 |
感谢解答! |
I have similar problem And Second Question: |
@muzhi1991 Hi! You make a good question.
|
Thank you for your reply! it answered my question, I’m in agreement with you about whether to add_generation_token or not, but I found an inconsistency inside the trl library. I don’t know if I misunderstood the code or not. In the implementation of Here: Lines 195 to 198 in 94d53e6
And here Line 211 in 94d53e6
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
When I looked at the examples I found that the example script for DPO uses
apply_chat_template
forchosen
andrejected
but not forprompt
.trl/examples/scripts/dpo.py
Lines 150 to 152 in d1ed730
And it seems that
chosen
is a complete conversation.I think that using chat_template for the input prompt and only remaining the
assistant
output aschosen
/rejected
will be consistent with the inference phase.The text was updated successfully, but these errors were encountered: