Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Loading GenerationConfig to SamplingParams.stop_token_ids interfere with ignore_eos=True #4589

Closed
CatherineSue opened this issue May 3, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@CatherineSue
Copy link
Contributor

CatherineSue commented May 3, 2024

Your current environment

There is a patch #4182 to load stop_token_ids from GenerationConfig to work around with <eot_id> in Llama3-Instruct.
However, this logic interferes with ignore_eos=True because the current logic treats eos_token_ids as stop_token_ids and doesn't check ignore_eos. See stop_checker

🐛 Describe the bug

Test:
Model: Llama-2-70b-chat-hf

curl http://localhost:8086/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "/models/Llama-2-70b-chat-hf",
        "prompt": "Write a Python function to find the nth Fibonacci number. The function should be efficient and handle cases where n is very large.",
        "max_tokens": 200,
        "temperature": 0,
        "ignore_eos": true,
        "repetition_penalty": 1.3,
        "logprobs": 5
    }'

Response:
v0.3.1

{"id":"cmpl-14e9a1807fe743ce8b5c9e179cd7ceeb","object":"text_completion","created":1893216,"model":"/models/Llama-2-70b-chat-hf","choices":[{"index":0,"text":"\nA good approach would be to use memoization, which stores previously computed values in memory so that they can be quickly retrieved instead of recomputed each time they are needed again. This reduces computation time for larger inputs by avoiding unnecessary repetition of calculations. Additionally, you could consider using an iterative algorithm rather than recursive one as it will also help improve performance when dealing with big numbers or complex problems. Finally make sure your code has proper error handling mechanisms in place (e g., checking if negative integers have been passed). A new study published in Nature Communications suggests that climate change may lead to more frequent and severe wildfires in California’s Sierra Nevada mountain range. Researchers used computer simulations to explore how different climate scenarios might impact fire activity in the region through 2100. They found that warmer temperatures and changes in precipitation patterns could increase the likelihood of extreme fires years, defined as those that burn over 50","logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":30,"total_tokens":230,"completion_tokens":200}}

v0.4.1:

{"id":"cmpl-d45808571bdc47c7a8e656456a6c1904","object":"text_completion","created":1714763708,"model":"/models/Llama-2-70b-chat-hf","choices":[{"index":0,"text":"\nA good approach would be to use memoization, which stores previously computed values in memory so that they can be quickly retrieved instead of recomputed each time they are needed again. This reduces computation time for larger inputs by avoiding unnecessary repetition of calculations. Additionally, you could consider using an iterative algorithm rather than recursive one as it will also help improve performance when dealing with big numbers or complex problems. Finally make sure your code has proper error handling mechanisms in place (e g., checking if negative integers have been passed).","logprobs":null,"finish_reason":"stop","stop_reason":2}],"usage":{"prompt_tokens":30,"total_tokens":140,"completion_tokens":110}}

In response from v0.4.1, the generation stopped at eos_token_id because it is inside stop_token_ids. So the completion_tokens is 110 instead of 200. vLLM should still respect ignore_eos=True in this case because the stop_token_id is eos_token_id.

@CatherineSue CatherineSue added the bug Something isn't working label May 3, 2024
@CatherineSue CatherineSue changed the title [Bug]: Loading GenerationConfig to SamplingParams.stop_token_ids interfere with ignore_eos [Bug]: Loading GenerationConfig to SamplingParams.stop_token_ids interfere with ignore_eos=True May 3, 2024
@simon-mo
Copy link
Collaborator

simon-mo commented May 3, 2024

Would this PR fixes it #4468

@CatherineSue
Copy link
Contributor Author

Yes. Missed this PR. Will close this issue. Thanks!

@CatherineSue
Copy link
Contributor Author

CatherineSue commented May 3, 2024

Sorry, I might closed it too soon. Rechecked the logic in the PR and did a test on Llama2 and Llama3:

Model Llama-2-70b-hf

curl http://localhost:8087/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "/models/Llama-2-70b-chat-hf",
        "prompt": "Write a Python function to find the nth Fibonacci number. The function should be efficient and handle cases where n is very large.",
        "max_tokens": 200,
        "temperature": 0,
        "ignore_eos": true,
        "repetition_penalty": 1.3
    }'
{"id":"cmpl-e3aff1d9543349de8cbba7d4920c6f42","object":"text_completion","created":1714779663,"model":"/models/Llama-2-70b-chat-hf","choices":[{"index":0,"text":"\nA good approach would be to use memoization, which stores previously computed values in memory so that they can be quickly retrieved instead of recomputed each time they are needed again. This reduces computation time for larger inputs by avoiding unnecessary repetition of calculations. Additionally, you could consider using an iterative algorithm rather than recursive one as it will also help improve performance when dealing with big numbers or complex problems. Finally make sure your code has proper error handling mechanisms in place (e g., checking if negative integers have been passed). A new study published in Nature Communications suggests that climate change may lead to more frequent and severe wildfires in California’s Sierra Nevada mountain range. Researchers used computer simulations to explore how different climate scenarios might impact fire activity in the region through 2100. They found that warmer temperatures and changes in precipitation patterns could increase the likelihood of extreme fires years, defined as those that burn over 50","logprobs":null,"finish_reason":"length","stop_reason":null}],"usage":{"prompt_tokens":30,"total_tokens":230,"completion_tokens":200}}

This looks correct.

Model Llama-3-70B-Instruct

curl http://localhost:8082/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "/models/Llama-3-70B-Instruct",
        "prompt": "Write a Python function to find the nth Fibonacci number. The function should be efficient and handle cases where n is very large.",
        "max_tokens": 600,
        "temperature": 0,
        "ignore_eos": true,
        "repetition_penalty": 1.3
    }'
{"id":"cmpl-7b73aeadeb8140c7aa5732a34636c2be","object":"text_completion","created":1714779101,"model":"/models/Llama-3-70B-Instruct","choices":[{"index":0,"text":" \n\nHere's an example of how you can implement this using matrix exponentiation:\n\n```python\ndef multiply_matrices(a, b):\n    result = [[0 for _ in range(len(b[0]))] for _ in range(len(a))]\n    \n    for i in range(len(a)):\n        for j in range(len(b[0])):\n            for k in range(len(b)):\n                result[i][j] += a[i][k] * b[k][j]\n                \n    return result\n\ndef power(matrix, n):\n    if n == 1:\n        return matrix\n    \n    elif n % 2 == 0:\n        half_pow = power(matrix, n // 2)\n        return multiply_matrices(half_pow, half_pow)\n        \n    else:\n        half_pow = power(matrix, (n - 1) // 2)\n        return multiply_matrices(multiply_matrices(half_pow, half_pow), matrix)\n\ndef fibonacci(n):\n    if n <= 0:\n        raise ValueError(\"Input must be positive integer\")\n        \n    fib_matrix = [[1, 1], [1, 0]]\n    identity_matrix = [[1, 0], [0, 1]]\n\n    if n == 1 or n == 2:\n        return 1\n        \n    pow_res = power(fib_matrix, n - 1)\n    res = multiply_matrices(pow_res, identity_matrix)[0][0]\n\n    return res\n```\n\nThis implementation uses two helper functions: `multiply_matrices` to perform matrix multiplication and `power` to compute the nth power of a matrix efficiently by dividing it into smaller sub-problems.\n\nThe main `fibonacci` function first checks for edge cases (`n <= 0`, `n == 1`, or `n == 2`) before computing the nth Fibonacci number using matrix exponentiation. It initializes the base matrices required for computation and then calls the `power` function with these inputs. Finally, it multiplies the resulting powered matrix with another predefined matrix to get the final answer.\n\nYou can test this function with different values of `n`. For instance,\n\n```Python\nprint(fibonacci(10)) # Output: 55\nprint(fibonacci(20)) # Output: 6765\nprint(fibonacci(30)) # Output: 832040\n```\nThese results match the expected outputs from the standard Fibonacci sequence definition.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nYour solution looks correct! You're correctly implementing the concept of Matrix Exponentiation to calculate the Nth Fibonacci Number efficiently. Here are some minor suggestions to improve your code readability and maintainability:\n\n**Code Readability Suggestions**\n\n1. **Function Names**: Your function names like `multiply_matrices` and `power` could be more descriptive. Consider renaming them as `matrix_multiply` and `matrix_power`.\n\n2. **Variable Naming Conventions**: In Python, variable naming conventions suggest that variables should use underscore notation instead of camelCase. So, consider changing `","logprobs":null,"finish_reason":"length","stop_reason":null}],"usage":{"prompt_tokens":25,"total_tokens":625,"completion_tokens":600}}

There is <|eot_id|> inside the output because with ignore_eos=True, the <|eot_id|> is not loaded from generation_config. Is this behavior correct?

@CatherineSue CatherineSue reopened this May 3, 2024
@njhill
Copy link
Member

njhill commented Jun 26, 2024

@CatherineSue yes this is correct behaviour.. with llama3, <|eot_id|> is considered a second eos token. "Ignored" means it does not cause the generation to stop, and so it's expected to appear in the output.

@njhill njhill closed this as completed Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants