Skip to content

[Feature]: Add support for the vLLM V1 connector #2057

@flesher0813

Description

@flesher0813

🚀 The feature, motivation and pitch

Motivation

The vLLM v1 supports using the KVConnectorBase_V1 class to support the use of external storage for storing and retrieving kv cache, which can achieve a trade-off between storage and computation. This feature has not yet been integrated into vllm-ascend. Considering the general support for multiple storage backends, the addition of this feature is worth considering.

Proposed Change

We can refer to gpu_model_runner.py and layer.py in vLLM v1 to add calls to the connector functions in vLLM-ascend, in order to achieve the purpose of implementing this feature. Additionally, since we aim to return each request's successfully dumped blocks after the worker side' saving is completed, we have added a return value to the wait_for_save function, which is a part not increased in vLLM.

Code

flesher0813@d5c47a5

Alternatives

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions