You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
llama.cpp allows speculative sampling - that is using big models with small models to be able to run big models on constrained HW
Describe the solution you'd like
This is just a tracker
The text was updated successfully, but these errors were encountered:
…el config (#1052)
**Description**
This PR fixes#1013.
It adds `draft_model` and `n_draft` to the model YAML config in order to
load models with speculative sampling. This should be compatible as well
with grammars.
example:
```yaml
backend: llama
context_size: 1024
name: my-model-name
parameters:
model: foo-bar
n_draft: 16
draft_model: model-name
```
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Is your feature request related to a problem? Please describe.
llama.cpp allows speculative sampling - that is using big models with small models to be able to run big models on constrained HW
Describe the solution you'd like
This is just a tracker
The text was updated successfully, but these errors were encountered: