fix last tokens passing to sample_repetition_penalties function #1295

ymikhailov · 2024-03-21T16:26:33Z

Bug: currently all previous tokens are passing to llama_sample_repetition_penalties function instead of last penalty_last_n tokens.

Description: llama_sample_repetition_penalties function expects last penalty_last_n tokens. But it gets all previously generated tokens which leads to the algorithm not working correctly.

As you can see in the code below it will use only first penalty_last_n tokens to create token_count dict:

    ...
    // Create a frequency map to count occurrences of each token in last_tokens
    std::unordered_map<llama_token, int> token_count;
    for (size_t i = 0; i < penalty_last_n; ++i) {
        token_count[last_tokens[i]]++;
    }
    ...

This behavior is obviously incorrect which in our case led to the case where our model had a lot of answers with repetitions.

Also in this PR was fixed bug with possible buffer overflow that could cause to segfault in code above.

abetlen · 2024-04-01T19:25:34Z

@ymikhailov thank you for the fix! Looks good.

…tlen#1295) Co-authored-by: ymikhaylov <ymikhaylov@x5.ru> Co-authored-by: Andrei <abetlen@gmail.com>

* feat: add support for KV cache quantization options (abetlen#1307) * add KV cache quantization options abetlen#1220 abetlen#1305 * Add ggml_type * Use ggml_type instead of string for quantization * Add server support --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com> * fix: Changed local API doc references to hosted (abetlen#1317) * chore: Bump version * fix: last tokens passing to sample_repetition_penalties function (abetlen#1295) Co-authored-by: ymikhaylov <ymikhaylov@x5.ru> Co-authored-by: Andrei <abetlen@gmail.com> * feat: Update llama.cpp * fix: segfault when logits_all=False. Closes abetlen#1319 * feat: Binary wheels for CPU, CUDA (12.1 - 12.3), Metal (abetlen#1247) * Generate binary wheel index on release * Add total release downloads badge * Update download label * Use official cibuildwheel action * Add workflows to build CUDA and Metal wheels * Update generate index workflow * Update workflow name * feat: Update llama.cpp * chore: Bump version * fix(ci): use correct script name * docs: LLAMA_CUBLAS -> LLAMA_CUDA * docs: Add docs explaining how to install pre-built wheels. * docs: Rename cuBLAS section to CUDA * fix(docs): incorrect tool_choice example (abetlen#1330) * feat: Update llama.cpp * fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes abetlen#1328 abetlen#1314 * fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes abetlen#1328 Closes abetlen#1314 * feat: Update llama.cpp * fix: Always embed metal library. Closes abetlen#1332 * feat: Update llama.cpp * chore: Bump version --------- Co-authored-by: Limour <93720049+Limour-dev@users.noreply.github.com> Co-authored-by: Andrei Betlen <abetlen@gmail.com> Co-authored-by: lawfordp2017 <lawfordp@gmail.com> Co-authored-by: Yuri Mikhailov <bitsharp@gmail.com> Co-authored-by: ymikhaylov <ymikhaylov@x5.ru> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

fix last tokens passing to sample_repetition_penalties function

7f83e1d

ymikhailov force-pushed the fix-last-tokens-passing branch from 3e06ff1 to 7f83e1d Compare March 22, 2024 15:18

Merge branch 'main' into fix-last-tokens-passing

5cf3793

abetlen merged commit 62aad61 into abetlen:main Apr 1, 2024

xhedit pushed a commit to xhedit/llama-cpp-conv that referenced this pull request Apr 6, 2024

fix: last tokens passing to sample_repetition_penalties function (abe…

a09fcf6

…tlen#1295) Co-authored-by: ymikhaylov <ymikhaylov@x5.ru> Co-authored-by: Andrei <abetlen@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix last tokens passing to sample_repetition_penalties function #1295

fix last tokens passing to sample_repetition_penalties function #1295

ymikhailov commented Mar 21, 2024 •

edited

Loading

abetlen commented Apr 1, 2024

fix last tokens passing to sample_repetition_penalties function #1295

fix last tokens passing to sample_repetition_penalties function #1295

Conversation

ymikhailov commented Mar 21, 2024 • edited Loading

abetlen commented Apr 1, 2024

ymikhailov commented Mar 21, 2024 •

edited

Loading