[Bug]: Regression in predictions in v0.4.3 #5280

hibukipanim · 2024-06-05T14:54:57Z

🐛 Describe the bug

The predictions changed between v0.4.2 and v0.4.3 - both the actual tokens at temperature 0 and also the logprobs.

I'll show here how to reproduce the issue with a TinyLLama (I noticed it orignally with Llama-3-8B-Instruct).

Running 2 vLLM servers with:

python -m vllm.entrypoints.openai.api_server --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 --gpu-memory-utilization 0.4

for 0.4.2 adding --port 8001 and for 0.4.3 --port 8002.

Then running this client code:

def convert_logprobs_to_chat(legacy_logprobs: Logprobs) -> ChoiceLogprobs:
    top_logprobs = []
    for top_token, top_logprob in legacy_logprobs.top_logprobs[0].items():
        top_logprobs.append(
            TopLogprob(token=top_token, logprob=top_logprob)
        )

    chat_logprobs = ChatCompletionTokenLogprob(
        token=legacy_logprobs.tokens[0],
        logprob=legacy_logprobs.token_logprobs[0],
        top_logprobs=top_logprobs
    )

    return ChoiceLogprobs(content=[chat_logprobs])

vllms = {
    "0.4.2": "http://localhost:8001/v1",
    "0.4.3": "http://localhost:8002/v1",
}

for version, endpoint in vllms.items():
    print(f"\nvLLM {version=}, {endpoint=}")
    client = openai.OpenAI(
        base_url=endpoint,
        api_key="foo"
    )

    msgs = [{"role": "user", "content": "3**7=?"}]
    response = client.chat.completions.create(
        model="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
        messages=msgs,
        max_tokens=1,
        logprobs=True,
        top_logprobs=5,
        temperature=0
    )
    print(f"answer (first token): {response.choices[0].message.content}")
    if version == "0.4.2":
        legacy_logprobs = response.choices[0].logprobs
        print(f"{legacy_logprobs=}")
        logprobs = convert_logprobs_to_chat(legacy_logprobs)
    else:
        logprobs = response.choices[0].logprobs
    
    top_logprobs = logprobs.content[0].top_logprobs
    print(f"top_logprobs:\t\t{top_logprobs}")
    sorted_top_logprobs = sorted(top_logprobs, key=lambda x: -x.logprob)
    print(f"sorted_top_logprobs:\t{sorted_top_logprobs}")

Getting this output:

vLLM version='0.4.2', endpoint='http://localhost:8001/v1'
answer (first token): The
legacy_logprobs.top_logprobs=[{'The': -1.3429100513458252, 'Yes': -1.6554100513458252, 'No': -2.405410051345825, '3': -2.530410051345825, 'S': -3.405410051345825}]
top_logprobs:		[TopLogprob(token='The', bytes=None, logprob=-1.3429100513458252), TopLogprob(token='Yes', bytes=None, logprob=-1.6554100513458252), TopLogprob(token='No', bytes=None, logprob=-2.405410051345825), TopLogprob(token='3', bytes=None, logprob=-2.530410051345825), TopLogprob(token='S', bytes=None, logprob=-3.405410051345825)]
sorted_top_logprobs:	[TopLogprob(token='The', bytes=None, logprob=-1.3429100513458252), TopLogprob(token='Yes', bytes=None, logprob=-1.6554100513458252), TopLogprob(token='No', bytes=None, logprob=-2.405410051345825), TopLogprob(token='3', bytes=None, logprob=-2.530410051345825), TopLogprob(token='S', bytes=None, logprob=-3.405410051345825)]

vLLM version='0.4.3', endpoint='http://localhost:8002/v1'
answer (first token): 3
top_logprobs:		[TopLogprob(token='3', bytes=[51], logprob=-0.8362228870391846), TopLogprob(token='Yes', bytes=[89, 101, 115], logprob=-2.2112228870391846), TopLogprob(token='1', bytes=[49], logprob=-2.6487228870391846), TopLogprob(token='No', bytes=[78, 111], logprob=-2.8362228870391846), TopLogprob(token='7', bytes=[55], logprob=-3.2112228870391846)]
sorted_top_logprobs:	[TopLogprob(token='3', bytes=[51], logprob=-0.8362228870391846), TopLogprob(token='Yes', bytes=[89, 101, 115], logprob=-2.2112228870391846), TopLogprob(token='1', bytes=[49], logprob=-2.6487228870391846), TopLogprob(token='No', bytes=[78, 111], logprob=-2.8362228870391846), TopLogprob(token='7', bytes=[55], logprob=-3.2112228870391846)]

Note that for 0.4.2 I'm converting the logprobs format to the correct chat-format. No longer needed in 0.4.3 since #5029.
Also, note I sorted the logprobs, as noticed that since 0.4.3 they sometimes are not sorted in descending order as before, but I'm not sure that should be guaranteed or not (although in this specific example it returned already sorted).

The text was updated successfully, but these errors were encountered:

DarkLight1337 · 2024-06-05T15:33:58Z

This might have been caused by #4688. After #5278 is merged, try setting add_special_tokens=True and see whether the original behaviour is restored.

DarkLight1337 · 2024-06-05T15:34:38Z

Does this issue occur using the offline LLM class?

hibukipanim · 2024-06-05T15:48:41Z

thanks! indeed it can related.

the chat template for this model indeed doesn't include bos:

msgs = [{"role": "user", "content": "hello"}]
print(f"{tokenizer.bos_token=}")
print(f"{tokenizer.decode(tokenizer.apply_chat_template(msgs))=}")

outputs:

tokenizer.bos_token='<s>'
tokenizer.decode(tokenizer.apply_chat_template(msgs))='<|user|>\nhello</s> \n'

Edit: however it doesn't explain why the issue is reproduced also with NousResearch/Hermes-2-Pro-Llama-3-8B where the chat-template does include BOS:

tokenizer.bos_token='<|begin_of_text|>'
tokenizer.decode(tokenizer.apply_chat_template(msgs))='<|begin_of_text|><|im_start|>user\nhello<|im_end|>\n'

DarkLight1337 · 2024-06-06T03:08:06Z

Does this issue occur using the offline LLM class?

To narrow down the issue, try comparing them in offline inference as mentioned above.

hibukipanim · 2024-06-07T12:33:49Z

Couldn't reproduce offline with the LLM() class and also not with the legacy /completions endpoint.

So it points indeed to the PR you linked about the default BOS token.

Thanks!
closing the issue

hibukipanim added the bug Something isn't working label Jun 5, 2024

hibukipanim closed this as completed Jun 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Regression in predictions in v0.4.3 #5280

[Bug]: Regression in predictions in v0.4.3 #5280

hibukipanim commented Jun 5, 2024 •

edited

Loading

DarkLight1337 commented Jun 5, 2024 •

edited

Loading

DarkLight1337 commented Jun 5, 2024

hibukipanim commented Jun 5, 2024 •

edited

Loading

DarkLight1337 commented Jun 6, 2024

hibukipanim commented Jun 7, 2024

[Bug]: Regression in predictions in v0.4.3 #5280

[Bug]: Regression in predictions in v0.4.3 #5280

Comments

hibukipanim commented Jun 5, 2024 • edited Loading

🐛 Describe the bug

DarkLight1337 commented Jun 5, 2024 • edited Loading

DarkLight1337 commented Jun 5, 2024

hibukipanim commented Jun 5, 2024 • edited Loading

DarkLight1337 commented Jun 6, 2024

hibukipanim commented Jun 7, 2024

hibukipanim commented Jun 5, 2024 •

edited

Loading

DarkLight1337 commented Jun 5, 2024 •

edited

Loading

hibukipanim commented Jun 5, 2024 •

edited

Loading