Skip to content

ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)

Notifications You must be signed in to change notification settings

pku-liang/ArkVale

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction

[Link] [Paper] [Poster] [Slides]

Download

git clone https://github.com/pku-liang/ArkVale.git --recursive 

or

git clone https://github.com/pku-liang/ArkVale.git
cd ArkVale
git submodule update --init --recursive --depth 1 

Install

pip install -r requirements.txt
cd source && python3 setup.py [develop|install]

Usage

from transformers import AutoModelForCausalLM
from arkvale import adapter
path = ...
dev = torch.device("cuda:0")
dtype = torch.float16
model = (
    AutoModelForCausalLM
    .from_pretrained(path, torch_dtype=dtype, device_map=dev)
    .eval()
)
adapter.enable_arkvale(
    model, 
    dtype=dtype, 
    device=dev, 
    page_size=32,
    # page_budgets=None, # page_budgets=None means "full" (no eviction & recall)
    page_budgets=4096 // 32,
    page_topks=32,
    n_max_bytes=40 * (1 << 30),
    n_max_cpu_bytes=80 * (1 << 30),
)
...

About

ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published