Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DPO] Merge initial peft model if trainer has a peft_config #956

Merged
merged 2 commits into from
Nov 6, 2023

Conversation

kashif
Copy link
Collaborator

@kashif kashif commented Nov 5, 2023

fixes #742

We merge the initial peft adaptor if we are training with a peft_config

Co-authored-by: Shoaib Burq <saburq@gmail.com>
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Nov 5, 2023

The documentation is not available anymore as the PR was closed or merged.

@kashif
Copy link
Collaborator Author

kashif commented Nov 5, 2023

@Elfsong can you check if i am testing the failure case you have in the test?

@Elfsong
Copy link

Elfsong commented Nov 5, 2023

@Elfsong can you check if i am testing the failure case you have in the test?

@kashif Sure. I’m testing now. It doesn’t work on my side for now. The LoRA weights didn’t add upon the base model. Not sure if it’s my code problem.

@Elfsong
Copy link

Elfsong commented Nov 5, 2023

@Elfsong can you check if i am testing the failure case you have in the test?

@kashif Sure. I’m testing now. It doesn’t work on my side for now. The LoRA weights didn’t add upon the base model. Not sure if it’s my code problem.

I’m currently working on this demo FYI:
https://github.com/huggingface/trl/tree/main/examples/research_projects/stack_llama_2/scripts

Copy link

@Elfsong Elfsong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - given that this is a common scenario (people can SFT their model with peft and use that model for DPO) ! Thanks @kashif !

@younesbelkada younesbelkada merged commit 6c6ff24 into huggingface:main Nov 6, 2023
8 checks passed
@kashif kashif deleted the issue-742 branch November 6, 2023 08:52
lapp0 pushed a commit to lapp0/trl that referenced this pull request May 10, 2024
…ace#956)

* failing test
Co-authored-by: Shoaib Burq <saburq@gmail.com>

* merge initial peft model
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dpo training with Lora can not save fine-tuned weights
4 participants