Support speculative decoding with llama.cpp

**What would you like to be added**:

We have supported vllm, since llama.cpp adds this feature, we should support it as well, see https://github.com/ggerganov/llama.cpp/pull/10455

**Why is this needed**:

**Completion requirements**:

This enhancement requires the following artifacts:

- [ ] Design doc
- [ ] API change
- [ ] Docs update

The artifacts should be linked in subsequent comments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Support speculative decoding with llama.cpp #240

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Support speculative decoding with llama.cpp #240

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions