[Model] LoRA with lm_head and embed_tokens fully trained #8082

sergeykochetkov · 2024-09-02T11:56:07Z

Support lm_head and embed_tokens fully trained in LoRA.

We found that quality of our adapters significantly drops without fully-trained lm_head or lm_head trained in LoRA style.
This is functionality of peft modules_to_save=[lm_head, mebed_tokens] https://huggingface.co/docs/peft/v0.12.0/en/package_reference/&num;peft.LoraConfig.modules_to_save

The idea is to replace base_model VocabParallelEmbedding and ParallelLMHead by layers loaded from modules_to_save at inferencing LoRA

dirty implementation
tests for new functionality
checking old functionality is working
inference with fully trained lm_head performance measurement
implement embed_tokens fully trained as well

github-actions · 2024-09-02T11:56:19Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

sergeykochetkov · 2024-09-11T13:50:28Z

/ready

AlongWY · 2024-09-18T14:09:01Z

should it unmarked as Draft ?

mergify · 2024-10-30T12:04:34Z

This pull request has merge conflicts that must be resolved before it can be
merged. @sergeykochetkov please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

sergeykochetkov · 2024-11-01T11:33:37Z

/ready

sergeykochetkov · 2024-11-01T11:35:06Z

should it unmarked as Draft ?

yes, i am waiting for review

s.m.kochetkov added 5 commits August 31, 2024 09:02

last

44539e0

test

f11dc75

almost_works

d1b7e97

loading_weights

69f80fe

it_works

ebfffbe

sergeykochetkov changed the title ~~LoRA with lm_head fully trained~~ [Model] LoRA with lm_head fully trained Sep 2, 2024

s.m.kochetkov added 10 commits September 5, 2024 11:40

benchmarking

60e1563

it_works_fast

915a76a

remove_opt_einsum

879a415

benchmark_latency

8f23df5

no_hardcode

1391944

block_wise_impl

17e75ca

revert

9471379

refactor

6ad3dac

merge_with_main

7e7e2ad

format

1984ef3

sergeykochetkov marked this pull request as ready for review September 11, 2024 13:49

sergeykochetkov marked this pull request as draft September 11, 2024 13:57

SMAntony mentioned this pull request Oct 19, 2024

[Bug]: Unable to infer QLoRA adapter using vLLM Docker #9402

Closed

1 task

s.m.kochetkov added 3 commits October 30, 2024 09:18

embed_tokens_dirty_implemented

8a9cfd5

modules_to_save_embed_tokens_implemented

1de3291

enable_modules_to_save_flag

f2209bb

mergify bot added the needs-rebase label Oct 30, 2024

s.m.kochetkov added 3 commits October 31, 2024 09:21

fix_parse

e29a598

bgmv_embed

17b12a4

block_n

f7396be

s.m.kochetkov added 2 commits November 1, 2024 09:41

argument_enable_modules_to_save

68a6d6e

merge_with_upstream

683adaa

mergify bot removed the needs-rebase label Nov 1, 2024

lora

85decf8

sergeykochetkov changed the title ~~[Model] LoRA with lm_head fully trained~~ [Model] LoRA with lm_head and embed_tokens fully trained Nov 1, 2024

format

e27d6b6

sergeykochetkov marked this pull request as ready for review November 1, 2024 11:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model] LoRA with lm_head and embed_tokens fully trained #8082

[Model] LoRA with lm_head and embed_tokens fully trained #8082

sergeykochetkov commented Sep 2, 2024 •

edited

Loading

github-actions bot commented Sep 2, 2024

sergeykochetkov commented Sep 11, 2024

AlongWY commented Sep 18, 2024

mergify bot commented Oct 30, 2024

sergeykochetkov commented Nov 1, 2024

sergeykochetkov commented Nov 1, 2024

[Model] LoRA with lm_head and embed_tokens fully trained #8082

Are you sure you want to change the base?

[Model] LoRA with lm_head and embed_tokens fully trained #8082

Conversation

sergeykochetkov commented Sep 2, 2024 • edited Loading

github-actions bot commented Sep 2, 2024

sergeykochetkov commented Sep 11, 2024

AlongWY commented Sep 18, 2024

mergify bot commented Oct 30, 2024

sergeykochetkov commented Nov 1, 2024

sergeykochetkov commented Nov 1, 2024

sergeykochetkov commented Sep 2, 2024 •

edited

Loading