Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Multi-LoRA/qLoRA in PEFT #1005

Closed
merlintang opened this issue Oct 9, 2023 · 7 comments
Closed

Support Multi-LoRA/qLoRA in PEFT #1005

merlintang opened this issue Oct 9, 2023 · 7 comments

Comments

@merlintang
Copy link

Feature request

Dear All

We are woking on improving the GPU memory usage for multi-lora fine tune. As you know, different LLM models with same base model can share the GPU memory in theory. based on this motivation, we had done some experiment results and show that it indeed improve the memory greatly. our code repo is here: https://github.com/TUDB-Labs/multi-lora-fine-tune

any comments are welcome.

Motivation

save the GPU memory usage

Your contribution

can we send our improvement into this project or how we can implement this under this framework? any suggestions are welcome.

@BenjaminBossan
Copy link
Member

Thanks for bringing this to our attention. I have a couple of questions and comments:

  1. "GPU Memory Conservation": Could you give more details on that, do you have some numbers about reduced memory usage? Note that in PEFT, it is already possible to have multiple LoRA adapters at the same time. PR FIX: setting requires_grad on adapter layers #905 would also add mixed batch inference. AFAICT your lib supports mixed batch training, but it's not clear to me how well that works vs training the adapters separately.
  2. "Automatic Parameter Learning": How exactly is that implemented? I couldn't find it at a quick glance.
  3. Some of the improvements you describe stem from the training step itself (e.g. use of early stopping). Since that is out of scope for PEFT, what advantages would be left without those improvements?

@mikecovlee
Copy link

We have experiment results for memory usage at README.md, shows the peak memory usage of the existing method(Alpaca-LoRA) compared to our method on one NVIDIA RTX A6000 GPU. Alpaca-LoRA triggered an OOM error after 4 parallel tasks, while our method can handle twice that amount.

@mikecovlee
Copy link

Also, in terms of time cost, we achieved approximately 5% faster results than the sequential execution of Alpaca-LoRA, which utilizes peft as its implementation.

@mikecovlee
Copy link

@BenjaminBossan, "Automatic Parameter Learning" and other enhancements, such as early stopping, are extended features built upon our core library, offering a more convenient fine-tuning solution. However, these functions are not yet fully completed; the performance improvements I mentioned earlier remain our key selling points.

@BenjaminBossan
Copy link
Member

We have experiment results for memory usage at README.md, shows the peak memory usage of the existing method(Alpaca-LoRA) compared to our method on one NVIDIA RTX A6000 GPU. Alpaca-LoRA triggered an OOM error after 4 parallel tasks, while our method can handle twice that amount.

Ah, thanks for that. Is the script to create this benchmark contained in the repo somewhere?

Copy link

github-actions bot commented Nov 8, 2023

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@mikecovlee
Copy link

Sorry forgot to response. We have put our experimental code at https://github.com/yezhem/aspen-experiment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants