Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Derek/multi lora serving fix #2223

Merged
merged 77 commits into from
Jul 18, 2024
Merged
Changes from all commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
1da0b41
Adding multi-lora-serving
datavistics Jun 28, 2024
8b9345e
Updating date
datavistics Jun 28, 2024
449f145
Adding code types and missing code
datavistics Jun 28, 2024
27b3144
Merge remote-tracking branch 'origin/derek/multi-lora-serving' into d…
datavistics Jun 28, 2024
72162f5
Update title _blog.yml
datavistics Jul 3, 2024
8baa3c1
Update title _blog.yml
datavistics Jul 3, 2024
31f6270
Update title multi-lora-serving.md
datavistics Jul 3, 2024
86e0bf7
Fix title Update multi-lora-serving.md
datavistics Jul 3, 2024
fe1b39f
Moving headers down one
datavistics Jul 3, 2024
d9e55a0
Update multi-lora-serving.md
datavistics Jul 3, 2024
72f9f91
Headers 1 down
datavistics Jul 3, 2024
81e4b2d
Removing extra intro
datavistics Jul 3, 2024
f636b90
Update callouts
datavistics Jul 3, 2024
50eff11
Update multi-lora-serving.md
datavistics Jul 3, 2024
7c41999
Update multi-lora-serving.md
datavistics Jul 3, 2024
f2d0f53
Update multi-lora-serving.md
datavistics Jul 3, 2024
9360a1e
Update multi-lora-serving.md
datavistics Jul 3, 2024
88b87c2
Update multi-lora-serving.md
datavistics Jul 3, 2024
ae5fea1
Update multi-lora-serving.md
datavistics Jul 3, 2024
eee00dc
Update multi-lora-serving.md
datavistics Jul 3, 2024
43a2354
Adding TGI explanation
datavistics Jul 3, 2024
0f925d3
Update multi-lora-serving.md
datavistics Jul 3, 2024
44fc64b
Merge branch 'main' into derek/multi-lora-serving
datavistics Jul 3, 2024
624b8b2
Merge branch 'main' into derek/multi-lora-serving
datavistics Jul 4, 2024
5a5d195
Apply suggestions from code review
datavistics Jul 4, 2024
6329ed6
Fixing 3x
datavistics Jul 4, 2024
f3e811c
Adding adapter introduction
datavistics Jul 4, 2024
070a2d4
Fixing manim table caption
datavistics Jul 4, 2024
16afeff
Actually fixing manim table caption
datavistics Jul 4, 2024
00ef73c
Update multi-lora-serving.md
datavistics Jul 5, 2024
7d0ce07
Update multi-lora-serving.md
datavistics Jul 5, 2024
dbcbf61
Update multi-lora-serving.md
datavistics Jul 5, 2024
2870093
Adding guide
datavistics Jul 5, 2024
6d3afa5
Update multi-lora-serving.md
datavistics Jul 5, 2024
2e239c9
Update multi-lora-serving.md
datavistics Jul 5, 2024
63b4e3c
Update multi-lora-serving.md
datavistics Jul 5, 2024
17a62da
Update multi-lora-serving.md
datavistics Jul 5, 2024
7120b8d
Update multi-lora-serving.md
datavistics Jul 5, 2024
8abec19
Update multi-lora-serving.md
datavistics Jul 5, 2024
f30939f
Update multi-lora-serving.md
datavistics Jul 5, 2024
eec2c48
Update multi-lora-serving.md
datavistics Jul 5, 2024
5f66411
Update multi-lora-serving.md
datavistics Jul 5, 2024
ac72039
Update multi-lora-serving.md
datavistics Jul 5, 2024
a602f91
Update multi-lora-serving.md
datavistics Jul 5, 2024
67d5652
Update multi-lora-serving.md
datavistics Jul 5, 2024
14a67d5
Update multi-lora-serving.md
datavistics Jul 5, 2024
57b652c
Update multi-lora-serving.md
datavistics Jul 5, 2024
18611ae
Update multi-lora-serving.md
datavistics Jul 5, 2024
209a24f
Update multi-lora-serving.md
datavistics Jul 5, 2024
a373924
Apply suggestions from code review
datavistics Jul 5, 2024
994891b
Update _blog.yml
datavistics Jul 5, 2024
aaa1d51
Merge branch 'main' into derek/multi-lora-serving
datavistics Jul 8, 2024
0058fe7
Update multi-lora-serving.md
datavistics Jul 8, 2024
f433938
Update multi-lora-serving.md
datavistics Jul 8, 2024
0db6cd5
Update multi-lora-serving.md
datavistics Jul 8, 2024
5ef6ef2
Update multi-lora-serving.md
datavistics Jul 8, 2024
6a859ac
Update multi-lora-serving.md
datavistics Jul 8, 2024
9c03e5a
Update multi-lora-serving.md
datavistics Jul 8, 2024
ea7f71a
Update multi-lora-serving.md
datavistics Jul 8, 2024
c367060
Update multi-lora-serving.md
datavistics Jul 8, 2024
f92e3cd
Update multi-lora-serving.md
datavistics Jul 8, 2024
fec7f0f
Update multi-lora-serving.md
datavistics Jul 8, 2024
570842c
Update multi-lora-serving.md
datavistics Jul 8, 2024
dcb7f4f
Update multi-lora-serving.md
datavistics Jul 8, 2024
c226d07
Update multi-lora-serving.md
datavistics Jul 8, 2024
a9f8d92
Update multi-lora-serving.md
datavistics Jul 8, 2024
05f1c1b
Update multi-lora-serving.md
datavistics Jul 18, 2024
daa5c74
Update multi-lora-serving.md
datavistics Jul 18, 2024
4e545ba
Merge branch 'main' into derek/multi-lora-serving
datavistics Jul 18, 2024
48844e4
Adding v0.24 hub
datavistics Jul 18, 2024
e305800
Adding v0.24 hub
datavistics Jul 18, 2024
21c7510
Adding unversioned links
datavistics Jul 18, 2024
8ede201
Code formatting
datavistics Jul 18, 2024
52ddd9a
Updating thumbnail.png
datavistics Jul 18, 2024
5022dd9
Fixing latex
datavistics Jul 18, 2024
98af9fd
Merge branch 'main' into derek/multi-lora-serving-fix
datavistics Jul 18, 2024
9e7f6b0
Update multi-lora-serving.md
datavistics Jul 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions multi-lora-serving.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ The obvious benefit of LoRA is that it makes fine-tuning a lot cheaper by reduci
|----------------------------|
| *Figure 1: LoRA Explained* |

During training, LoRA freezes the original weights \\W\\ and fine-tunes two small matrices, \\A\\ and \\B\\, making fine-tuning much more efficient. With this in mind, we can see in _Figure 1_ how LoRA works during inference. We take the output from the pre-trained model \\Wx\\, and we add the Low Rank _adaptation_ term \\BAx\\ [[6]](#6).
During training, LoRA freezes the original weights `W` and fine-tunes two small matrices, `A` and `B`, making fine-tuning much more efficient. With this in mind, we can see in _Figure 1_ how LoRA works during inference. We take the output from the pre-trained model `Wx`, and we add the Low Rank _adaptation_ term `BAx` [[6]](#6).


## Multi-LoRA Serving
Expand All @@ -66,7 +66,7 @@ Now that we understand the basic idea of model adaptation introduced by LoRA, we
|----------------------------------|
| *Figure 2: Multi-LoRA Explained* |

_Figure 2_ shows how this dynamic adaptation works. Each user request contains the input \\x\\ along with the id for the corresponding LoRA for the request (we call this a heterogeneous batch of user requests). The task information is what allows TGI to pick the right LoRA adapter to use.
_Figure 2_ shows how this dynamic adaptation works. Each user request contains the input `x` along with the id for the corresponding LoRA for the request (we call this a heterogeneous batch of user requests). The task information is what allows TGI to pick the right LoRA adapter to use.

Multi-LoRA serving enables you to deploy the base model just once. And since the LoRA adapters are small, you can load many adapters. Note the exact number will depend on your available GPU resources and what model you deploy. What you end up with is effectively equivalent to having multiple fine-tuned models in one single deployment.

Expand Down