-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dpo training with Lora can not save fine-tuned weights #742
Comments
I have the same question~ |
Maybe @younesbelkada and @kashif could have a look. |
@LuJunru did you try to save the model via:
|
@kashif In fact, I had the same problem when training DPO (without accelerate). I believe the code here is what you described, but it doesn't work |
@kashif @Moyhub @lvwerra Hi Guys, thank you for the feedback. When i directly check the weights in adapter bin, i found the weights was saved. However, the weight key there was base_model.model.base_model.model.xxxx, this maybe related to the deepspeed wrapper. I tried to rename the key to base_model.model.xxxx, the merging succeeded. You may have a check as well. |
@younesbelkada something we could also test with #724. |
It seems like when you feed a |
Can you provide a code example for your solution? |
Seeing same issue. Model performance of training same as original model due to this even after dpo training. Any conclusion for this bug? |
Same issue here |
Could you share the example code? Thanks. |
I have a PR that fixes this issue by merging the initial peft adapter if the trainer gets an additional peft_config. Can you kindly try it out? |
Thanks for the great discussion. Just wish to mention there's subtle bug/nuance with the most recent PR. As the DPO trainer has merged original base_model and LoRA with merge_and_unload(), the final saved LoRA adapter weights is based on the merged model, NOT the original base_model...This means that at inference time, one should merge the base_model with the original LoRA weight first, before loading the RLHF weight. Personally I find it to be an important detail and took me a while to figure it out. Might be a good idea to highlight in future DPO tutorials. Many thanks. |
issue
The following script manage to train and save. However, the saved weights are incorrect.
Training script
Training output
Weight check when doing merge
Weight check outputs
I think even if the Lora fine-tuning is not convergent, the saved adpater value should not be zero?
The text was updated successfully, but these errors were encountered: