llama : add infill sampler #9896

ggerganov · 2024-10-15T10:11:32Z

Add a new sampler that I think is suitable for infill tasks. It promotes end-of-generation (EOG) tokens if it is not very confident and also combines (merges) tokens with common prefix.

I think there are more improvements that can be made specifically for fill-in-the-middle, but this version seems to work OK for now.

curl \
    --silent --no-buffer --request POST --url http://127.0.0.1:8012/infill \
    --header "Content-Type: application/json" \
    --data '{"input_suffix": "    return 0;\n}\n", "input_prefix": "#include <cstdio>\n\nint main() {\n    printf(", "prompt": "", "top_k": 40, "top_p": 0.99, "samplers": ["top_k", "top_p", "infill"]}' | jq

Also add the following function to the libllama API:

    // check if token0 is contained as a prefix in token1
    LLAMA_API bool llama_token_is_prefix(
              const struct llama_model * model,
                           llama_token   token0,
                           llama_token   token1);

slaren · 2024-10-15T10:29:54Z

src/llama-sampling.cpp

+        // normalize probs
+        for (size_t i = 0; i < cur_p->size; ++i) {
+            cur_p->data[i].p /= p_sum;
+        }


A few samplers do this, but I don't see the point because every sampler that needs the probabilities calls softmax first anyway and recomputes the probabilities.

During the refactor I came to the conclusion that the we only really store logits. Every time probabilities are needed a softmax is done to get them, llama_token_data::p is only used as temporary storage for the result of the softmax, and could be removed entirely.

I think currently there is a scenario that uses the ps - call dist sampler without explicit softmax before that. We don't do it in any of the examples, but it's technically possible?

Anyway, I agree that the p should be removed completely.

I think that should be considered a bug in the dist sampler then, because there is no way to know if the probabilities are valid without calling softmax. So any sampler that needs them, must call softmax.

Yes, indeed it is a bug. There are a few places where we do:

llama_sampler * smpl = llama_sampler_chain_init(sparams); llama_sampler_chain_add(smpl, llama_sampler_init_top_k(params.sparams.top_k)); llama_sampler_chain_add(smpl, llama_sampler_init_top_p(params.sparams.top_p, params.sparams.min_keep)); llama_sampler_chain_add(smpl, llama_sampler_init_temp (params.sparams.temp)); llama_sampler_chain_add(smpl, llama_sampler_init_dist (params.sparams.seed));

This would render the temperature sampler useless as it modifies only the logits. I think we should remove the explicit softmax calls in places like common/sampling.cpp:

llama_sampler_chain_add(result->chain, llama_sampler_init_softmax()); // remove this llama_sampler_chain_add(result->chain, llama_sampler_init_dist(params.seed));

And update the dist sampler to do softmax at the start. Sounds good?

Sounds good. We should probably remove llama_sampler_init_softmax entirely since it is useless to applications.

ggml-ci

slaren reviewed Oct 15, 2024

View reviewed changes

github-actions bot added the examples label Oct 15, 2024

ggerganov mentioned this pull request Oct 15, 2024

llama : default sampling changes + greedy update #9897

Merged

ggerganov force-pushed the gg/infill-3 branch from f91ce94 to 8a22f2a Compare October 15, 2024 11:30

llama : add infill sampler

d8d0eea

ggml-ci

ggerganov force-pushed the gg/infill-3 branch from 8a22f2a to d8d0eea Compare October 15, 2024 11:30

ggerganov merged commit 755a9b2 into master Oct 15, 2024
58 checks passed

ggerganov deleted the gg/infill-3 branch October 15, 2024 13:35

This was referenced Oct 15, 2024

llama.vim : plugin for Neovim #9787

Merged

llama : infill sampling handle very long tokens #9924

Merged

drollings pushed a commit to drollings/llama.cpp that referenced this pull request Oct 18, 2024

llama : add infill sampler (ggerganov#9896)

06e841b

ggml-ci

iboB mentioned this pull request Oct 28, 2024

Update submodules alpaca-core/ac-local#146

Closed

4 tasks

dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024

llama : add infill sampler (ggerganov#9896)

0edff57

ggml-ci

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024

llama : add infill sampler (ggerganov#9896)

6feb93f

ggml-ci

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024

llama : add infill sampler (ggerganov#9896)

51f742e

ggml-ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : add infill sampler #9896

llama : add infill sampler #9896

ggerganov commented Oct 15, 2024 •

edited

Loading

slaren Oct 15, 2024

ggerganov Oct 15, 2024 •

edited

Loading

slaren Oct 15, 2024

ggerganov Oct 15, 2024 •

edited

Loading

slaren Oct 15, 2024

llama : add infill sampler #9896

llama : add infill sampler #9896

Conversation

ggerganov commented Oct 15, 2024 • edited Loading

slaren Oct 15, 2024

Choose a reason for hiding this comment

ggerganov Oct 15, 2024 • edited Loading

Choose a reason for hiding this comment

slaren Oct 15, 2024

Choose a reason for hiding this comment

ggerganov Oct 15, 2024 • edited Loading

Choose a reason for hiding this comment

slaren Oct 15, 2024

Choose a reason for hiding this comment

ggerganov commented Oct 15, 2024 •

edited

Loading

ggerganov Oct 15, 2024 •

edited

Loading

ggerganov Oct 15, 2024 •

edited

Loading