[Feature]: Support Prefix Caching for Hidden States (Pooling Endpoint)

### 🚀 The feature, motivation and pitch
I would like the `/pooling` endpoint to support prefix caching for hidden states.

### Background
The `/pooling` endpoint is designed to extract hidden states/embeddings by performing a full prefill pass over all input tokens. However, it currently doesn't support prefix caching - every request recomputes all tokens from scratch, even for repeated prefixes.

### Feature Request:
Enable prefix caching for the `/pooling` endpoint, so that:
- Hidden states for cache-hit tokens are retrieved from cache (not recomputed)
- Only new/uncached tokens need computation
- The complete hidden states (cached + newly computed) are returned

### Why This Matters:
Many applications process the same prefixes repeatedly (system prompts, instruction templates, etc.):
- Without hidden state caching: every `/pooling` request recomputes the entire sequence
- With hidden state caching: reuse cached hidden states -> only compute new tokens -> much better throughput

### Alternatives
Currently, the only option is to use `/pooling` without prefix caching, which results in high latency for repeated prefixes.

### Additional context
Related issues:
- #12249
- #11905

This feature would require caching hidden states alongside KV cache, sharing the same prefix matching logic and eviction policy.

### Before submitting a new issue...
- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [[documentation page](https://docs.vllm.ai/en/latest/)](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Support Prefix Caching for Hidden States (Pooling Endpoint) #26839

🚀 The feature, motivation and pitch

Background

Feature Request:

Why This Matters:

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Support Prefix Caching for Hidden States (Pooling Endpoint) #26839

Description

🚀 The feature, motivation and pitch

Background

Feature Request:

Why This Matters:

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions