-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PEFT support for Online DPO #2041
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
can you kindly add the above peft/lora usage command in the script? |
Particular attention for reviewing this one. I didn't use the code from DPO mostly because I don't understand it. So I'm afraid to have missed something.
Overall I've choose to go for a code that I understand, even if it implies introducing bug that have been fixed in the past for other trainers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding PEFT support! Overall LGTM - have you run an experiment on e.g. TLDR to see if it looks OK?
Edit: sorry, just saw your comment!
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
This is needed if we have e.g. an SFT LoRA that needs to be merged into the base model, before finally inserting an adapter for DPO. See here for some considerations.
Sure, let's leave this as follow-up PR
Fine with me. We can add it later if people open an issue.
Good question, this was added by Younes but maybe @BenjaminBossan can help clarify here why PEFT models need upcasting in 4-bit? |
The original addition of this function was from #1110 (which in turn references this repo). I don't know the exact context, but I think the reason is to avoid indiscriminately casting all layers to bf16 -- specifically, layer norm stays in float32. This is probably based on some empirical finding that this is better for training, but after skimming the QLoRA paper, I could not find any mention of that, so I'm unsure. |
Thanks a lot @BenjaminBossan and @lewtun. I'll further investigate in a dedicated branch. |
What does this PR do?
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.