-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Loading GenerationConfig to SamplingParams.stop_token_ids interfere with ignore_eos=True #4589
Comments
Would this PR fixes it #4468 |
Yes. Missed this PR. Will close this issue. Thanks! |
Sorry, I might closed it too soon. Rechecked the logic in the PR and did a test on Llama2 and Llama3: Model Llama-2-70b-hf
This looks correct. Model Llama-3-70B-Instruct
There is |
@CatherineSue yes this is correct behaviour.. with llama3, |
Your current environment
There is a patch #4182 to load stop_token_ids from GenerationConfig to work around with <eot_id> in Llama3-Instruct.
However, this logic interferes with ignore_eos=True because the current logic treats eos_token_ids as stop_token_ids and doesn't check ignore_eos. See stop_checker
🐛 Describe the bug
Test:
Model: Llama-2-70b-chat-hf
Response:
v0.3.1
v0.4.1:
In response from v0.4.1, the generation stopped at eos_token_id because it is inside stop_token_ids. So the
completion_tokens
is 110 instead of 200. vLLM should still respectignore_eos=True
in this case because thestop_token_id
is eos_token_id.The text was updated successfully, but these errors were encountered: