-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow Peft models to share their base model #1905
Conversation
This is pretty cool! I tested this using the following script, and it seems to work well! I served multiple LoRAs using the following script:
Looking at the GPU utilization, we only load the base model once, so in my case, we only load |
thank goodness it works for someone else. I've submitted a few too many things that didn't 100% work. I have it running as well on GCP with two modes and they give distinct results and fit within VRAM as expected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Why are these changes needed?
This adds a special environment variable that activates shared Peft model base weights. Currently when loading two Peft models that have the same base model, those model weights are loaded once. With this flag activated, all Peft models will share the same base model.
To make this work it requires a few work around due to how Huggingface's Peft model has implemented LoRA adapters, the most popular variant. These modify the base model's pytorch modules directly and thus adapters sharing the same base model must live within the same model object and a
set_adapter
method must be called to switch between them.Related issue number (if applicable)
Expands #1805
Checks
format.sh
to lint the changes in this PR.