Skip to content

[RFC]: native kvcache offloading #3241

@HF-001

Description

@HF-001

Motivation.

Currently, in vLLM v1 there is in-house solution for offloading KV cache data from the GPU memory to other medium (in particular, CPU memory).But it is not yet available in vllm-ascend, so I hope to add this feature to vllm -ascend.

There is a proposed RFC (#19854) from vllm,and PRs(#19848#20075#21448#22595 and #24251)

Proposed Change.

To enable this feature,we need add a CPUOffloadingSpec compatible with vllm-ascend,and add a CpuNpuOffloadingHandler for kvcache offloading and loading(cpu->npu and npu-> cpu).

Feedback Period.

No response

CC List.

No response

Any Other Things.

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    RFCRequest For Comments

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions