Questionable best-practice recommendation in DPO docs #1324

R-seny · 2024-02-11T00:49:01Z

This DPO documentation page suggests that the best way to merge adaptors is to merge them into a quantized model and then dequantize, quoting this tweet with anecdotal evidence (no measurements).

This goes against common intuition (QLoRA trains high-precision adapters to compensate for quantization precision losses, so merging them into a highly quantized is highly unintuitive). It also goes against some more in-depth investigations and does not mention principled research approaches, e.g. QALoRA.

Moreover, the script linked as an example goes against the recommendation in the tweet, first DEquantizing the model and then merging. Maybe there was simply a typo in the referenced tweet.

I also don't fully understand where the 1-2% performance loss figure comes from in the next sentence. Maybe it's a coincidence, but this is similar to the reported advantage of QALoRA vs regular merging, but QALoRA is not the same thing as what the tweet suggests.

I submitted a pull request with a slightly revised version of this and the next paragraphs (#1325).

younesbelkada · 2024-02-14T13:21:39Z

Thanks for submitting the PR to address that! Closing the issue as #1325 being merged

younesbelkada closed this as completed Feb 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questionable best-practice recommendation in DPO docs #1324

Questionable best-practice recommendation in DPO docs #1324

R-seny commented Feb 11, 2024 •

edited

Loading

younesbelkada commented Feb 14, 2024

Questionable best-practice recommendation in DPO docs #1324

Questionable best-practice recommendation in DPO docs #1324

Comments

R-seny commented Feb 11, 2024 • edited Loading

younesbelkada commented Feb 14, 2024

R-seny commented Feb 11, 2024 •

edited

Loading