[Feature]: Will future versions of vllm support FlashMLA, the inference acceleration technology open-sourced by DeepSeek?

### 🚀 The feature, motivation and pitch

I propose adding support for FlashMLA  in future versions of vllm. FlashMLA is an inference acceleration technology developed by DeepSeek, designed to optimize large-scale transformer models by improving efficiency in both memory usage and computation time, especially during inference tasks.

related link: https://github.com/deepseek-ai/FlashMLA

### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Will future versions of vllm support FlashMLA, the inference acceleration technology open-sourced by DeepSeek? #13744

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Will future versions of vllm support FlashMLA, the inference acceleration technology open-sourced by DeepSeek? #13744

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions