[Bug]: Loading GenerationConfig to SamplingParams.stop_token_ids interfere with ignore_eos=True #4589

CatherineSue · 2024-05-03T19:17:45Z

Your current environment

There is a patch #4182 to load stop_token_ids from GenerationConfig to work around with <eot_id> in Llama3-Instruct.
However, this logic interferes with ignore_eos=True because the current logic treats eos_token_ids as stop_token_ids and doesn't check ignore_eos. See stop_checker

🐛 Describe the bug

Test:
Model: Llama-2-70b-chat-hf

curl http://localhost:8086/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "/models/Llama-2-70b-chat-hf",
        "prompt": "Write a Python function to find the nth Fibonacci number. The function should be efficient and handle cases where n is very large.",
        "max_tokens": 200,
        "temperature": 0,
        "ignore_eos": true,
        "repetition_penalty": 1.3,
        "logprobs": 5
    }'

Response:
v0.3.1

{"id":"cmpl-14e9a1807fe743ce8b5c9e179cd7ceeb","object":"text_completion","created":1893216,"model":"/models/Llama-2-70b-chat-hf","choices":[{"index":0,"text":"\nA good approach would be to use memoization, which stores previously computed values in memory so that they can be quickly retrieved instead of recomputed each time they are needed again. This reduces computation time for larger inputs by avoiding unnecessary repetition of calculations. Additionally, you could consider using an iterative algorithm rather than recursive one as it will also help improve performance when dealing with big numbers or complex problems. Finally make sure your code has proper error handling mechanisms in place (e g., checking if negative integers have been passed). A new study published in Nature Communications suggests that climate change may lead to more frequent and severe wildfires in California’s Sierra Nevada mountain range. Researchers used computer simulations to explore how different climate scenarios might impact fire activity in the region through 2100. They found that warmer temperatures and changes in precipitation patterns could increase the likelihood of extreme fires years, defined as those that burn over 50","logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":30,"total_tokens":230,"completion_tokens":200}}

v0.4.1:

{"id":"cmpl-d45808571bdc47c7a8e656456a6c1904","object":"text_completion","created":1714763708,"model":"/models/Llama-2-70b-chat-hf","choices":[{"index":0,"text":"\nA good approach would be to use memoization, which stores previously computed values in memory so that they can be quickly retrieved instead of recomputed each time they are needed again. This reduces computation time for larger inputs by avoiding unnecessary repetition of calculations. Additionally, you could consider using an iterative algorithm rather than recursive one as it will also help improve performance when dealing with big numbers or complex problems. Finally make sure your code has proper error handling mechanisms in place (e g., checking if negative integers have been passed).","logprobs":null,"finish_reason":"stop","stop_reason":2}],"usage":{"prompt_tokens":30,"total_tokens":140,"completion_tokens":110}}

In response from v0.4.1, the generation stopped at eos_token_id because it is inside stop_token_ids. So the completion_tokens is 110 instead of 200. vLLM should still respect ignore_eos=True in this case because the stop_token_id is eos_token_id.

The text was updated successfully, but these errors were encountered:

simon-mo · 2024-05-03T21:49:56Z

Would this PR fixes it #4468

CatherineSue · 2024-05-03T23:04:39Z

Yes. Missed this PR. Will close this issue. Thanks!

CatherineSue · 2024-05-03T23:41:45Z

Sorry, I might closed it too soon. Rechecked the logic in the PR and did a test on Llama2 and Llama3:

Model Llama-2-70b-hf

curl http://localhost:8087/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "/models/Llama-2-70b-chat-hf",
        "prompt": "Write a Python function to find the nth Fibonacci number. The function should be efficient and handle cases where n is very large.",
        "max_tokens": 200,
        "temperature": 0,
        "ignore_eos": true,
        "repetition_penalty": 1.3
    }'
{"id":"cmpl-e3aff1d9543349de8cbba7d4920c6f42","object":"text_completion","created":1714779663,"model":"/models/Llama-2-70b-chat-hf","choices":[{"index":0,"text":"\nA good approach would be to use memoization, which stores previously computed values in memory so that they can be quickly retrieved instead of recomputed each time they are needed again. This reduces computation time for larger inputs by avoiding unnecessary repetition of calculations. Additionally, you could consider using an iterative algorithm rather than recursive one as it will also help improve performance when dealing with big numbers or complex problems. Finally make sure your code has proper error handling mechanisms in place (e g., checking if negative integers have been passed). A new study published in Nature Communications suggests that climate change may lead to more frequent and severe wildfires in California’s Sierra Nevada mountain range. Researchers used computer simulations to explore how different climate scenarios might impact fire activity in the region through 2100. They found that warmer temperatures and changes in precipitation patterns could increase the likelihood of extreme fires years, defined as those that burn over 50","logprobs":null,"finish_reason":"length","stop_reason":null}],"usage":{"prompt_tokens":30,"total_tokens":230,"completion_tokens":200}}

This looks correct.

Model Llama-3-70B-Instruct

curl http://localhost:8082/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "/models/Llama-3-70B-Instruct",
        "prompt": "Write a Python function to find the nth Fibonacci number. The function should be efficient and handle cases where n is very large.",
        "max_tokens": 600,
        "temperature": 0,
        "ignore_eos": true,
        "repetition_penalty": 1.3
    }'
{"id":"cmpl-7b73aeadeb8140c7aa5732a34636c2be","object":"text_completion","created":1714779101,"model":"/models/Llama-3-70B-Instruct","choices":[{"index":0,"text":" \n\nHere's an example of how you can implement this using matrix exponentiation:\n\n```python\ndef multiply_matrices(a, b):\n    result = [[0 for _ in range(len(b[0]))] for _ in range(len(a))]\n    \n    for i in range(len(a)):\n        for j in range(len(b[0])):\n            for k in range(len(b)):\n                result[i][j] += a[i][k] * b[k][j]\n                \n    return result\n\ndef power(matrix, n):\n    if n == 1:\n        return matrix\n    \n    elif n % 2 == 0:\n        half_pow = power(matrix, n // 2)\n        return multiply_matrices(half_pow, half_pow)\n        \n    else:\n        half_pow = power(matrix, (n - 1) // 2)\n        return multiply_matrices(multiply_matrices(half_pow, half_pow), matrix)\n\ndef fibonacci(n):\n    if n <= 0:\n        raise ValueError(\"Input must be positive integer\")\n        \n    fib_matrix = [[1, 1], [1, 0]]\n    identity_matrix = [[1, 0], [0, 1]]\n\n    if n == 1 or n == 2:\n        return 1\n        \n    pow_res = power(fib_matrix, n - 1)\n    res = multiply_matrices(pow_res, identity_matrix)[0][0]\n\n    return res\n```\n\nThis implementation uses two helper functions: `multiply_matrices` to perform matrix multiplication and `power` to compute the nth power of a matrix efficiently by dividing it into smaller sub-problems.\n\nThe main `fibonacci` function first checks for edge cases (`n <= 0`, `n == 1`, or `n == 2`) before computing the nth Fibonacci number using matrix exponentiation. It initializes the base matrices required for computation and then calls the `power` function with these inputs. Finally, it multiplies the resulting powered matrix with another predefined matrix to get the final answer.\n\nYou can test this function with different values of `n`. For instance,\n\n```Python\nprint(fibonacci(10)) # Output: 55\nprint(fibonacci(20)) # Output: 6765\nprint(fibonacci(30)) # Output: 832040\n```\nThese results match the expected outputs from the standard Fibonacci sequence definition.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nYour solution looks correct! You're correctly implementing the concept of Matrix Exponentiation to calculate the Nth Fibonacci Number efficiently. Here are some minor suggestions to improve your code readability and maintainability:\n\n**Code Readability Suggestions**\n\n1. **Function Names**: Your function names like `multiply_matrices` and `power` could be more descriptive. Consider renaming them as `matrix_multiply` and `matrix_power`.\n\n2. **Variable Naming Conventions**: In Python, variable naming conventions suggest that variables should use underscore notation instead of camelCase. So, consider changing `","logprobs":null,"finish_reason":"length","stop_reason":null}],"usage":{"prompt_tokens":25,"total_tokens":625,"completion_tokens":600}}

There is <|eot_id|> inside the output because with ignore_eos=True, the <|eot_id|> is not loaded from generation_config. Is this behavior correct?

njhill · 2024-06-26T17:45:17Z

@CatherineSue yes this is correct behaviour.. with llama3, <|eot_id|> is considered a second eos token. "Ignored" means it does not cause the generation to stop, and so it's expected to appear in the output.

CatherineSue added the bug Something isn't working label May 3, 2024

CatherineSue changed the title ~~[Bug]: Loading GenerationConfig to SamplingParams.stop_token_ids interfere with ignore_eos~~ [Bug]: Loading GenerationConfig to SamplingParams.stop_token_ids interfere with ignore_eos=True May 3, 2024

CatherineSue closed this as completed May 3, 2024

CatherineSue reopened this May 3, 2024

njhill closed this as completed Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Loading GenerationConfig to SamplingParams.stop_token_ids interfere with ignore_eos=True #4589

[Bug]: Loading GenerationConfig to SamplingParams.stop_token_ids interfere with ignore_eos=True #4589

CatherineSue commented May 3, 2024 •

edited

Loading

simon-mo commented May 3, 2024

CatherineSue commented May 3, 2024

CatherineSue commented May 3, 2024 •

edited

Loading

njhill commented Jun 26, 2024

[Bug]: Loading GenerationConfig to SamplingParams.stop_token_ids interfere with ignore_eos=True #4589

[Bug]: Loading GenerationConfig to SamplingParams.stop_token_ids interfere with ignore_eos=True #4589

Comments

CatherineSue commented May 3, 2024 • edited Loading

Your current environment

🐛 Describe the bug

simon-mo commented May 3, 2024

CatherineSue commented May 3, 2024

CatherineSue commented May 3, 2024 • edited Loading

njhill commented Jun 26, 2024

CatherineSue commented May 3, 2024 •

edited

Loading

CatherineSue commented May 3, 2024 •

edited

Loading