-
Notifications
You must be signed in to change notification settings - Fork 12k
Eval bug: trivial grammar crashes (DeepSeek R1 Distill Llama 8B) #11591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Tried w/ Regardless, will probably replace the if (grammar.stacks.empty()) {
throw std::runtime_error("Unexpected empty grammar stack after accepting piece: " + piece);
} |
Seems specific to DeepSeek-R1-Distill-Llama-8B-GGUF (the Qwen 7B & 32B distills don't crash with that grammar) |
It does not crash on my end: llama-cli --version
version: 4570 (6e84b0ab)
built with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.2.0
llama-cli -hf bartowski/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M --grammar 'root ::= "{"' -p hey -no-cnv
0.01.168.912 I system_info: n_threads = 16 (n_threads_batch = 16) / 24 | Metal : EMBED_LIBRARY = 1 | CPU : NEON = 1 | ARM_FMA = 1 | FP16_VA = 1 | DOTPROD = 1 | LLAMAFILE = 1 | ACCELERATE = 1 | AARCH64_REPACK = 1 |
0.01.168.912 I
0.01.169.188 I sampler seed: 2240775377
0.01.169.197 I sampler params:
repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = 4096
top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, temp = 0.800
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
0.01.169.200 I sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
0.01.169.200 I generate: n_ctx = 4096, n_batch = 2048, n_predict = -1, n_keep = 1
0.01.169.202 I
hey{ [end of text]
0.01.218.866 I llama_perf_sampler_print: sampling time = 0.33 ms / 4 runs ( 0.08 ms per token, 12232.42 tokens per second)
0.01.218.877 I llama_perf_context_print: load time = 453.76 ms
0.01.218.880 I llama_perf_context_print: prompt eval time = 19.98 ms / 2 tokens ( 9.99 ms per token, 100.12 tokens per second)
0.01.218.882 I llama_perf_context_print: eval time = 12.88 ms / 1 runs ( 12.88 ms per token, 77.67 tokens per second)
0.01.218.883 I llama_perf_context_print: total time = 78.57 ms / 3 tokens
0.01.220.078 I ggml_metal_free: deallocating |
Oh, mine crashes w/ the following versions:
|
fwiw, I'm getting this same exception when calling |
@phil-scott-78 thanks for reporting! Note that the Qwen distills should get better generally with #11607 (although no changes related to grammar), and another possible thing might be the double bos situation (addressing in #11616 ). Hope to circle back to this in a couple of days. |
right on. For what it's worth, I tried again with lazy grammar with Mistral-Small-24B-Instruct-2501. Gave it a prompt to include its thinking to force the issue. Same thing, output its thinking, got to the |
Found at least one issue: if a token contains or completes a trigger and adds text that can't be parsed by the grammar, then kaboom (came up while testing upcoming changes that add even more triggers (ref); testing possible fixes). In any case, the issue reported in this bug seems to work for me now, probably because of |
Will close this as I can't repro the original issue, please feel free to open a new one if you still experience problems! |
Uh oh!
There was an error while loading. Please reload this page.
Name and Version
latest
Operating systems
No response
Which llama.cpp modules do you know to be affected?
libllama (core library)
Command line
llama-cli -hf bartowski/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M --grammar 'root ::= "{"' -p hey -no-cnv
Problem description & steps to reproduce
With the following extremely simple grammar somehow at the time we reach the grammar sampler, there's only 1 candidate (
@
) and it hard crashes.First Bad Commit
cc/ @ggerganov could this be related to any recent refactoring? (
#10803 maybe?I'll try and bissect)Relevant log output
hey/tmp/llama.cpp-20250131-5280-k2rjfn/src/llama-grammar.cpp:1216: GGML_ASSERT(!grammar.stacks.empty()) failed
The text was updated successfully, but these errors were encountered: