speculative: Ensure draft and target model vocab matches #3812

KerfuffleV2 · 2023-10-27T12:55:17Z

It's currently possible to shoot yourself in the foot by trying to speculate using a draft model with vocab that doesn't match the target, and weird stuff will happen in that case. Naturally the draft model will fail 100% of the time, but looking at the logs it'll appear that the draft is just generating random unrelated stuff (even draft candidates with NaN as the probability).

When there's a mismatch you'll now get an error like:

main: error: draft model vocab must match target model to use speculation but target vocab size 152064 does not match draft vocab size 32002

or

main: error: draft model vocab must match target model to use speculation but token 0 content differs

This approach may be too strict, since one might possibly want to use a draft model with a few special tokens that differ. One way to deal with that might be to just say there can be X mismatches at most.

strcmp on every entry might also be overkill. On my system, checking 32,000 entries is instant but doing something like looping with a step of 10 would probably be fine also.

ggerganov · 2023-10-27T13:10:15Z

examples/speculative/speculative.cpp

@@ -64,6 +64,26 @@ int main(int argc, char ** argv) {
    params.n_gpu_layers = params.n_gpu_layers_draft;
    std::tie(model_dft, ctx_dft) = llama_init_from_gpt_params(params);

+    {
+        int n_vocab_tgt = llama_n_vocab(model_tgt);
+        if (n_vocab_tgt != llama_n_vocab(model_dft)) {


This is not ideal. Codellama 7B and 13B have vocab size of 32000 while Codellama 34B has vocab size of 32016. It's the same vocab but with some extra tokens.

We should not disallow such cases. Maybe just print errors / warnings, but still continue?

How about we error if the size differs by more than 100, and also check the content of min(n_vocab_tgt, n_vocab_dft) tokens? Maybe even start 5 or something, to allow for cases where something like BOS or EOS has different content.

100 and 5 are just random numbers I plucked out of the air. The actual values can be whatever you prefer.

Ok, thinks it would work

I changed it. Also made the token content mismatch message a bit more helpful. For example, trying to use Orca 3B to draft Mistral 7B:

main: error: draft model vocab must match target model to use speculation but token 259 content differs - target ' ', draft ' t'

* master: (350 commits) speculative : ensure draft and target model vocab matches (ggerganov#3812) llama : correctly report GGUFv3 format (ggerganov#3818) simple : fix batch handling (ggerganov#3803) cuda : improve text-generation and batched decoding performance (ggerganov#3776) server : do not release slot on image input (ggerganov#3798) batched-bench : print params at start log : disable pid in log filenames server : add parameter -tb N, --threads-batch N (ggerganov#3584) (ggerganov#3768) server : do not block system prompt update (ggerganov#3767) sync : ggml (conv ops + cuda MSVC fixes) (ggerganov#3765) cmake : add missed dependencies (ggerganov#3763) cuda : add batched cuBLAS GEMM for faster attention (ggerganov#3749) Add more tokenizer tests (ggerganov#3742) metal : handle ggml_scale for n%4 != 0 (close ggerganov#3754) Revert "make : add optional CUDA_NATIVE_ARCH (ggerganov#2482)" issues : separate bug and enhancement template + no default title (ggerganov#3748) Update special token handling in conversion scripts for gpt2 derived tokenizers (ggerganov#3746) llama : remove token functions with `context` args in favor of `model` (ggerganov#3720) Fix baichuan convert script not detecing model (ggerganov#3739) make : add optional CUDA_NATIVE_ARCH (ggerganov#2482) ...

…3812) * speculative: Ensure draft and target model vocab matches * Tolerate small differences when checking dft vs tgt vocab

speculative: Ensure draft and target model vocab matches

a089765

ggerganov reviewed Oct 27, 2023

View reviewed changes

Tolerate small differences when checking dft vs tgt vocab

41f5d2a

ggerganov approved these changes Oct 27, 2023

View reviewed changes

ggerganov merged commit 41aee4d into ggerganov:master Oct 27, 2023

KerfuffleV2 deleted the fix-speculate-mismatched-models branch November 17, 2023 03:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speculative: Ensure draft and target model vocab matches #3812

speculative: Ensure draft and target model vocab matches #3812

KerfuffleV2 commented Oct 27, 2023

ggerganov Oct 27, 2023 •

edited

Loading

KerfuffleV2 Oct 27, 2023

ggerganov Oct 27, 2023

KerfuffleV2 Oct 27, 2023

speculative: Ensure draft and target model vocab matches #3812

speculative: Ensure draft and target model vocab matches #3812

Conversation

KerfuffleV2 commented Oct 27, 2023

ggerganov Oct 27, 2023 • edited Loading

Choose a reason for hiding this comment

KerfuffleV2 Oct 27, 2023

Choose a reason for hiding this comment

ggerganov Oct 27, 2023

Choose a reason for hiding this comment

KerfuffleV2 Oct 27, 2023

Choose a reason for hiding this comment

ggerganov Oct 27, 2023 •

edited

Loading