Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model] LoRA with lm_head and embed_tokens fully trained #8082

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

sergeykochetkov
Copy link

@sergeykochetkov sergeykochetkov commented Sep 2, 2024

FIX #4186 #2816

Support lm_head and embed_tokens fully trained in LoRA.

We found that quality of our adapters significantly drops without fully-trained lm_head or lm_head trained in LoRA style.
This is functionality of peft modules_to_save=[lm_head, mebed_tokens] https://huggingface.co/docs/peft/v0.12.0/en/package_reference/#peft.LoraConfig.modules_to_save

The idea is to replace base_model VocabParallelEmbedding and ParallelLMHead by layers loaded from modules_to_save at inferencing LoRA

  • dirty implementation
  • tests for new functionality
  • checking old functionality is working
  • inference with fully trained lm_head performance measurement
  • implement embed_tokens fully trained as well

Copy link

github-actions bot commented Sep 2, 2024

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

  • Comment /ready on the PR
  • Add ready label to the PR
  • Enable auto-merge.

🚀

@sergeykochetkov sergeykochetkov changed the title LoRA with lm_head fully trained [Model] LoRA with lm_head fully trained Sep 2, 2024
@sergeykochetkov sergeykochetkov marked this pull request as ready for review September 11, 2024 13:49
@sergeykochetkov
Copy link
Author

/ready

@sergeykochetkov sergeykochetkov marked this pull request as draft September 11, 2024 13:57
@AlongWY
Copy link

AlongWY commented Sep 18, 2024

should it unmarked as Draft ?

Copy link

mergify bot commented Oct 30, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @sergeykochetkov please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Oct 30, 2024
@mergify mergify bot removed the needs-rebase label Nov 1, 2024
@sergeykochetkov sergeykochetkov changed the title [Model] LoRA with lm_head fully trained [Model] LoRA with lm_head and embed_tokens fully trained Nov 1, 2024
@sergeykochetkov sergeykochetkov marked this pull request as ready for review November 1, 2024 11:33
@sergeykochetkov
Copy link
Author

/ready

@sergeykochetkov
Copy link
Author

should it unmarked as Draft ?

yes, i am waiting for review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: lora base_model.model.lm_head.base_layer.weight is not supported
2 participants