-
-
Notifications
You must be signed in to change notification settings - Fork 11k
[Hardware][TPU][V1] Multi-LoRA Optimisations for the V1 TPU backend #15655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
…because xla doesn't allow partial updates Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
This reverts commit b78b088. Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the contribution!
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work optimizing lora here! Just had some minor notes, please take a look when you find the time. Otherwise we can address them in a separate PR if needs be.
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Head branch was pushed to by a user without write access
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, sorry for delaying the merge a bit! Let's get this landed today.
No worries! Yep I'm hoping once these tests pass we can merge it in. Would you mind re-enabling auto-merge? |
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
…llm-project#15655) Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Chengji Yao <chengjiyao@google.com> Signed-off-by: xihajun <junfan@krai.ai> Signed-off-by: Jorge de Freitas <jorge.de-freitas22@imperial.ac.uk> Signed-off-by: Jorge de Freitas <jorge@krai.ai> Co-authored-by: Chengji Yao <chengjiyao@google.com> Co-authored-by: xihajun <junfan@krai.ai> Co-authored-by: Jorge de Freitas <jorge.de-freitas22@imperial.ac.uk> Co-authored-by: Jorge de Freitas <jorge@krai.ai> Signed-off-by: amit <amit.man@gmail.com>
I have a few questions about the data published in this PR:
I also have a few questions about the "Hot Swapping" and "Compare Multi-LoRAs" tabs in this link: https://insights.krai.ai/benchmarking-multi-lora
|
Hi @amanocha thanks for your interest.
As for the questions about the website.
|





Summary
This PR optimises the Multi-LoRA implementation from #14238. This one should be merged in after it.
This includes several kernel optimisations:
And a few general ones:
expandop a82f3feThings left/RFC
LogitsProcessorWithLoRAintroduces a long (~1.5 second) stall when it's enabled, but not much activity seems to happen on the CPU or TPU during this time. I've disabled this for now.LogitsProcessorWithLoRAis always created even if there's no LoRA adapter that needs it, is there a reason for this?