Skip to content

Eval bug: trivial grammar crashes (DeepSeek R1 Distill Llama 8B) #11591

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ochafik opened this issue Feb 2, 2025 · 9 comments
Closed

Eval bug: trivial grammar crashes (DeepSeek R1 Distill Llama 8B) #11591

ochafik opened this issue Feb 2, 2025 · 9 comments
Labels
bug Something isn't working

Comments

@ochafik
Copy link
Collaborator

ochafik commented Feb 2, 2025

Name and Version

latest

Operating systems

No response

Which llama.cpp modules do you know to be affected?

libllama (core library)

Command line

llama-cli -hf bartowski/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M --grammar 'root ::= "{"' -p hey -no-cnv

Problem description & steps to reproduce

With the following extremely simple grammar somehow at the time we reach the grammar sampler, there's only 1 candidate (@) and it hard crashes.

First Bad Commit

cc/ @ggerganov could this be related to any recent refactoring? (#10803 maybe? I'll try and bissect)

Relevant log output

hey/tmp/llama.cpp-20250131-5280-k2rjfn/src/llama-grammar.cpp:1216: GGML_ASSERT(!grammar.stacks.empty()) failed
@ochafik
Copy link
Collaborator Author

ochafik commented Feb 2, 2025

Tried w/ --samplers "" and this time it's crashing on <|reserved_special_token_247|>, which I'm not sure should have made it this far (maybe wrong token type in the GGUF?).

Regardless, will probably replace the GGML_ASSERT(!grammar.stacks.empty()) with:

    if (grammar.stacks.empty()) {
        throw std::runtime_error("Unexpected empty grammar stack after accepting piece: " + piece);
    }

@ochafik ochafik added bug Something isn't working and removed bug-unconfirmed labels Feb 2, 2025
@ochafik ochafik changed the title Eval bug: grammar crashes as too few valid candidates getting through to it Eval bug: trivial grammar crashes (DeepSeek R1 Distill Llama 8B) Feb 2, 2025
@ochafik
Copy link
Collaborator Author

ochafik commented Feb 2, 2025

Seems specific to DeepSeek-R1-Distill-Llama-8B-GGUF (the Qwen 7B & 32B distills don't crash with that grammar)

@ggerganov
Copy link
Member

It does not crash on my end:

llama-cli --version
version: 4570 (6e84b0ab)
built with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.2.0

llama-cli -hf bartowski/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M --grammar 'root ::= "{"' -p hey -no-cnv

0.01.168.912 I system_info: n_threads = 16 (n_threads_batch = 16) / 24 | Metal : EMBED_LIBRARY = 1 | CPU : NEON = 1 | ARM_FMA = 1 | FP16_VA = 1 | DOTPROD = 1 | LLAMAFILE = 1 | ACCELERATE = 1 | AARCH64_REPACK = 1 | 
0.01.168.912 I 
0.01.169.188 I sampler seed: 2240775377
0.01.169.197 I sampler params: 
	repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
	dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = 4096
	top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, temp = 0.800
	mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
0.01.169.200 I sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
0.01.169.200 I generate: n_ctx = 4096, n_batch = 2048, n_predict = -1, n_keep = 1
0.01.169.202 I 
hey{ [end of text]


0.01.218.866 I llama_perf_sampler_print:    sampling time =       0.33 ms /     4 runs   (    0.08 ms per token, 12232.42 tokens per second)
0.01.218.877 I llama_perf_context_print:        load time =     453.76 ms
0.01.218.880 I llama_perf_context_print: prompt eval time =      19.98 ms /     2 tokens (    9.99 ms per token,   100.12 tokens per second)
0.01.218.882 I llama_perf_context_print:        eval time =      12.88 ms /     1 runs   (   12.88 ms per token,    77.67 tokens per second)
0.01.218.883 I llama_perf_context_print:       total time =      78.57 ms /     3 tokens
0.01.220.078 I ggml_metal_free: deallocating

@ochafik
Copy link
Collaborator Author

ochafik commented Feb 2, 2025

Oh, mine crashes w/ the following versions:

./build/bin/llama-cli --version
version: 4617 (90517ec4)
built with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.2.0

llama-cli --version  # homebrew
version: 4606 (a83f5286)
built with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.2.0

@phil-scott-78
Copy link

fwiw, I'm getting this same exception when calling llama_sampler_init_grammar_lazy from llamasharp consistently with any of the DeepSeek distills. It produces the think tags as expected, hits the triggerWords of and then immediately fails. However, I add a grammar via llama_sampler_init_grammar then no problem, just need to adjust the gbnf to account for the thinking tags. I believe llamasharp is built against 4620

@ochafik
Copy link
Collaborator Author

ochafik commented Feb 3, 2025

@phil-scott-78 thanks for reporting!

Note that the Qwen distills should get better generally with #11607 (although no changes related to grammar), and another possible thing might be the double bos situation (addressing in #11616 ). Hope to circle back to this in a couple of days.

@phil-scott-78
Copy link

phil-scott-78 commented Feb 3, 2025

right on. For what it's worth, I tried again with lazy grammar with Mistral-Small-24B-Instruct-2501. Gave it a prompt to include its thinking to force the issue. Same thing, output its thinking, got to the </think> and blew up on the assert when it came time to do the grammar. All this is on 4620 though. I'll try and reproduce with llama.cpp when I get a chance though. Don't want to be chasing ghosts already resolved because of that project lagging a bit.

@ochafik
Copy link
Collaborator Author

ochafik commented Feb 13, 2025

Found at least one issue: if a token contains or completes a trigger and adds text that can't be parsed by the grammar, then kaboom (came up while testing upcoming changes that add even more triggers (ref); testing possible fixes).

In any case, the issue reported in this bug seems to work for me now, probably because of #11616 (edit) #11607

@ochafik
Copy link
Collaborator Author

ochafik commented Feb 25, 2025

Will close this as I can't repro the original issue, please feel free to open a new one if you still experience problems!

@ochafik ochafik closed this as completed Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants