feat: speculative sampling #1013

mudler · 2023-09-05T17:07:23Z

Is your feature request related to a problem? Please describe.
llama.cpp allows speculative sampling - that is using big models with small models to be able to run big models on constrained HW

Describe the solution you'd like
This is just a tracker

mudler · 2023-09-09T14:47:23Z

what is left: exposing draft_model and speculative sampling to LocalAI model config

…el config (#1052) **Description** This PR fixes #1013. It adds `draft_model` and `n_draft` to the model YAML config in order to load models with speculative sampling. This should be compatible as well with grammars. example: ```yaml backend: llama context_size: 1024 name: my-model-name parameters: model: foo-bar n_draft: 16 draft_model: model-name ``` --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

mudler added the enhancement New feature or request label Sep 5, 2023

mudler self-assigned this Sep 5, 2023

This was referenced Sep 5, 2023

feat(speculative-sampling): add grammar support go-skynet/go-llama.cpp#203

Merged

feat(speculative-sampling): Add speculative sampling go-skynet/go-llama.cpp#200

Merged

mudler mentioned this issue Sep 14, 2023

feat(speculative-sampling): allow to specify a draft model in the model config #1052

Merged

mudler added the roadmap label Sep 14, 2023

mudler closed this as completed in #1052 Sep 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: speculative sampling #1013

feat: speculative sampling #1013

mudler commented Sep 5, 2023

mudler commented Sep 9, 2023

feat: speculative sampling #1013

feat: speculative sampling #1013

Comments

mudler commented Sep 5, 2023

mudler commented Sep 9, 2023