Add batched Llama model definition using vLLM paged attention (#1134) · mlc-ai/mlc-llm@fee2cb5 · GitHub

Commit

Add batched Llama model definition using vLLM paged attention (#1134)

Browse files

* Add batched Llama model with vllm paged attention

* update core.py

* doc

* minor

* add e2e test

* mv file

* clean

* Check if TVM has been built with USE_VLLM

* update BuildArgs docstring

Loading branch information

masahi authored Oct 30, 2023

1 parent ba67835 commit fee2cb5

0 comments on commit `fee2cb5`

Please sign in to comment.