-
Notifications
You must be signed in to change notification settings - Fork 570
Open
Labels
RFCRequest For CommentsRequest For Comments
Description
Motivation.
Currently, in vLLM v1 there is in-house solution for offloading KV cache data from the GPU memory to other medium (in particular, CPU memory).But it is not yet available in vllm-ascend, so I hope to add this feature to vllm -ascend.
There is a proposed RFC (#19854) from vllm,and PRs(#19848,#20075, #21448, #22595 and #24251)
Proposed Change.
To enable this feature,we need add a CPUOffloadingSpec compatible with vllm-ascend,and add a CpuNpuOffloadingHandler for kvcache offloading and loading(cpu->npu and npu-> cpu).
Feedback Period.
No response
CC List.
No response
Any Other Things.
No response
jianzs
Metadata
Metadata
Assignees
Labels
RFCRequest For CommentsRequest For Comments