-
Notifications
You must be signed in to change notification settings - Fork 866
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
TP llama with continuous batching (#2709)
* Enabled left side padding in llama2 model * Add unit test for tp_llama * Use no_grad context manager * [WIP] Converting llama_tp into cont batching handler * [WIP]Implement prefill and decode for tp_llama * WIP fixing decode * Fix return prefill vs decode format in tp_llama handler * Fix current_position index in llama_handler * Fix kv caching issue with different results for different padding lengths * Fix liniting error * Remove cuda dependency for tp_llama test * fix handler mock * Adjust expected result for 13b model in tp llama test * Make continuous batching work with tp llama * Adjust sample txt * Add missing requirements * Add support for chat dialogs * Use model archiver config in tp_llama test --------- Co-authored-by: Hamid Shojanazeri <hamid.nazeri2010@gmail.com>
- Loading branch information
1 parent
be5ff32
commit df94a56
Showing
9 changed files
with
1,373 additions
and
392 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,11 @@ | ||
[ | ||
{ | ||
"dialog": | ||
[ | ||
{ | ||
"role": "user", | ||
"content": "what is the recipe of mayonnaise?" | ||
} | ||
] | ||
|
||
] | ||
], | ||
"max_new_tokens": 50, | ||
"mode":"chat" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.