-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Regression in predictions in v0.4.3 #5280
Comments
Does this issue occur using the offline |
thanks! indeed it can related. the chat template for this model indeed doesn't include bos: msgs = [{"role": "user", "content": "hello"}]
print(f"{tokenizer.bos_token=}")
print(f"{tokenizer.decode(tokenizer.apply_chat_template(msgs))=}") outputs: tokenizer.bos_token='<s>'
tokenizer.decode(tokenizer.apply_chat_template(msgs))='<|user|>\nhello</s> \n' Edit: however it doesn't explain why the issue is reproduced also with tokenizer.bos_token='<|begin_of_text|>'
tokenizer.decode(tokenizer.apply_chat_template(msgs))='<|begin_of_text|><|im_start|>user\nhello<|im_end|>\n' |
To narrow down the issue, try comparing them in offline inference as mentioned above. |
Couldn't reproduce offline with the So it points indeed to the PR you linked about the default BOS token. Thanks! |
🐛 Describe the bug
The predictions changed between v0.4.2 and v0.4.3 - both the actual tokens at temperature 0 and also the logprobs.
I'll show here how to reproduce the issue with a TinyLLama (I noticed it orignally with Llama-3-8B-Instruct).
Running 2 vLLM servers with:
for 0.4.2 adding
--port 8001
and for 0.4.3--port 8002
.Then running this client code:
Getting this output:
Note that for 0.4.2 I'm converting the logprobs format to the correct chat-format. No longer needed in 0.4.3 since #5029.
Also, note I sorted the logprobs, as noticed that since 0.4.3 they sometimes are not sorted in descending order as before, but I'm not sure that should be guaranteed or not (although in this specific example it returned already sorted).
The text was updated successfully, but these errors were encountered: