docs: remove llama_sampler_accept reference in sampling sample usage #16920

alundb · 2025-11-01T18:44:15Z

In #9386, llama_sampler_sample was updated to include the call to llama_sampler_accept. This change made the sampling API sample usage misleading, as the documentation still referenced the subsequent llama_sampler_accept-call.

This PR removes the documentation reference to llama_sampler_accept, clarifying that this method should not be invoked as part of the decoding loop, as it is already called internally as part of llama_sampler_sample (since #9386).

commit 5fb5e24 (llama : minor sampling refactor (2) (ggml-org#9386)) moved the llama_sampler_accept call into llama_sampler_sample, but the sampling sample usage in llama.h was forgotten to be updated accordingly.

alundb · 2025-11-01T18:55:10Z

Fyi, I noticed this the hard way by looking at llama.h as the main reference when toying around with some simple bnf grammar sampling/decoding, and due to having added this extra llama_sampler_accept(chain, id);, I kept running into:

terminate called after throwing an instance of 'std::runtime_error'
  what():  Unexpected empty grammar stack after accepting piece: J
Aborted (core dumped)

I hope this PR helps save someone else from making the same mistake in the future.

* origin/master: (169 commits) opencl: support imrope (ggml-org#16914) fix: Viewing multiple PDF attachments (ggml-org#16974) model-conversion : pass config to from_pretrained (ggml-org#16963) server : add props.model_alias (ggml-org#16943) ggml: CUDA: add head size 72 for flash-attn (ggml-org#16962) mtmd: add --image-min/max-tokens (ggml-org#16921) mtmd: pad mask for qwen2.5vl (ggml-org#16954) ggml : LoongArch fixes (ggml-org#16958) sync: minja (glm 4.6 & minmax m2 templates) (ggml-org#16949) SYCL: optimized repeat_back kernel (3× fewer asm instructions, 2× faster)Feature/sycl repeat back opt (ggml-org#16869) feat(webui): improve LaTeX rendering with currency detection (ggml-org#16508) test-backend-ops : fix segfault in moe-expert-reduce test in support mode and coverage (ggml-org#16936) ci : disable failing riscv cross build (ggml-org#16952) model: add Janus Pro for image understanding (ggml-org#16906) clip : use FA (ggml-org#16837) server : support unified cache across slots (ggml-org#16736) common : move gpt-oss reasoning processing to init params (ggml-org#16937) docs: remove llama_sampler_accept reference in sampling sample usage (ggml-org#16920) CUDA: add FLOOR, CEIL, ROUND, TRUNC unary ops (ggml-org#16917) devops: fix failing s390x docker build (ggml-org#16918) ...

docs: remove llama_sampler_accept reference in sampling sample usage

4373d4c

commit 5fb5e24 (llama : minor sampling refactor (2) (ggml-org#9386)) moved the llama_sampler_accept call into llama_sampler_sample, but the sampling sample usage in llama.h was forgotten to be updated accordingly.

ggerganov approved these changes Nov 2, 2025

View reviewed changes

ggerganov merged commit 76af40a into ggml-org:master Nov 2, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: remove llama_sampler_accept reference in sampling sample usage #16920

docs: remove llama_sampler_accept reference in sampling sample usage #16920

Uh oh!

alundb commented Nov 1, 2025

Uh oh!

alundb commented Nov 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

docs: remove llama_sampler_accept reference in sampling sample usage #16920

docs: remove llama_sampler_accept reference in sampling sample usage #16920

Uh oh!

Conversation

alundb commented Nov 1, 2025

Uh oh!

alundb commented Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alundb commented Nov 1, 2025 •

edited

Loading