-
Notifications
You must be signed in to change notification settings - Fork 55
Open
Milestone
Description
Native GPU - CPU Offloading (North/South)
Fast and native caching within a single vLLM instance. Provided out-of-the-box using vLLM’s KVConnector abstraction and integrates with KVEvents
vLLM Issues
vLLM PRs
- [KV offload][4/N] Offloading KV connector vllm-project/vllm#22595
- [KV offload][3/N] Add worker-side CPU support vllm-project/vllm#21448
- [KV offload][2/N] Introduce LRU-based CPU offloading management vllm-project/vllm#20075
- [KV offload][1/N] Introduce an offloading component vllm-project/vllm#19848
- v1: Support KV events from connectors vllm-project/vllm#19737
- v1: Pass KVConnectorOutput to scheduler-side vllm-project/vllm#22157
- [v1] Move block_hashes from KVCacheManager to Request.block_hashes (#19728) vllm-project/vllm#19728
- [KVConnector] Aggregate finished requests on the scheduler vllm-project/vllm#19555
cc @njhill (reviewer)