[Feature]: Add support for the vLLM V1 connector

### 🚀 The feature, motivation and pitch

### Motivation
The vLLM v1 supports using the KVConnectorBase_V1 class to support the use of external storage for storing and retrieving kv cache, which can achieve a trade-off between storage and computation. This feature has not yet been integrated into vllm-ascend. Considering the general support for multiple storage backends, the addition of this feature is worth considering.

### Proposed Change
We can refer to gpu_model_runner.py and layer.py in vLLM v1 to add calls to the connector functions in vLLM-ascend, in order to achieve the purpose of implementing this feature. Additionally, since we aim to return each request's successfully dumped blocks after the worker side' saving is completed, we have added a return value to the wait_for_save function, which is a part not increased in vLLM.

### Code
https://github.com/flesher0813/vllm-ascend/commit/d5c47a5c2620843cb1af0277ff17768f5e20e057

### Alternatives

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature]: Add support for the vLLM V1 connector #2057

🚀 The feature, motivation and pitch

Motivation

Proposed Change

Code

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature]: Add support for the vLLM V1 connector #2057

Description

🚀 The feature, motivation and pitch

Motivation

Proposed Change

Code

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions