Batching inference outputs are not the same with single inference #1761

gesanqiu · 2023-11-23T11:44:34Z

If I call the llm.generate with a batch prompts and greedy search, the output of batch inference is different to single batch inference. Is this right? A minimum reprodece script looks like this:

from vllm import LLM, SamplingParams

# Sample prompts.
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0)

# Create an LLM.
llm = LLM(model="/workdir/hf_models/llama-2-7b-chat-hf/", trust_remote_code=True)
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
    
    
for prompt in prompts:
    print(prompt)
    outputs = llm.generate(prompt, sampling_params)
    for output in outputs:
        generated_text = output.outputs[0].text
        print(f"Generated text: {generated_text!r}")

WoosukKwon · 2023-11-23T18:48:27Z

Hi @gesanqiu, I believe this should not be the case. Could you share a reproducible example?

gesanqiu · 2023-11-24T03:31:38Z

@WoosukKwon Sorry for made a mistake, this issue is tested on #1508, haven't reproduce this issue on vLLM release version yet. Need to check the dev branch.

gesanqiu · 2023-11-24T03:57:16Z

#1546 fixed this issue.

gesanqiu closed this as completed Nov 24, 2023

gesanqiu reopened this Nov 24, 2023

gesanqiu changed the title ~~Batching inference outputs are not the same~~ Batching inference outputs are not the same with single inference Nov 24, 2023

gesanqiu closed this as completed Nov 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batching inference outputs are not the same with single inference #1761

Batching inference outputs are not the same with single inference #1761

gesanqiu commented Nov 23, 2023 •

edited

Loading

WoosukKwon commented Nov 23, 2023

gesanqiu commented Nov 24, 2023 •

edited

Loading

gesanqiu commented Nov 24, 2023

Batching inference outputs are not the same with single inference #1761

Batching inference outputs are not the same with single inference #1761

Comments

gesanqiu commented Nov 23, 2023 • edited Loading

WoosukKwon commented Nov 23, 2023

gesanqiu commented Nov 24, 2023 • edited Loading

gesanqiu commented Nov 24, 2023

gesanqiu commented Nov 23, 2023 •

edited

Loading

gesanqiu commented Nov 24, 2023 •

edited

Loading