Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questionable best-practice recommendation in DPO docs #1324

Closed
R-seny opened this issue Feb 11, 2024 · 1 comment
Closed

Questionable best-practice recommendation in DPO docs #1324

R-seny opened this issue Feb 11, 2024 · 1 comment

Comments

@R-seny
Copy link
Contributor

R-seny commented Feb 11, 2024

This DPO documentation page suggests that the best way to merge adaptors is to merge them into a quantized model and then dequantize, quoting this tweet with anecdotal evidence (no measurements).

This goes against common intuition (QLoRA trains high-precision adapters to compensate for quantization precision losses, so merging them into a highly quantized is highly unintuitive). It also goes against some more in-depth investigations and does not mention principled research approaches, e.g. QALoRA.

Moreover, the script linked as an example goes against the recommendation in the tweet, first DEquantizing the model and then merging. Maybe there was simply a typo in the referenced tweet.

I also don't fully understand where the 1-2% performance loss figure comes from in the next sentence. Maybe it's a coincidence, but this is similar to the reported advantage of QALoRA vs regular merging, but QALoRA is not the same thing as what the tweet suggests.

I submitted a pull request with a slightly revised version of this and the next paragraphs (#1325).

@younesbelkada
Copy link
Contributor

Thanks for submitting the PR to address that! Closing the issue as #1325 being merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants