Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batching inference outputs are not the same with single inference #1761

Closed
gesanqiu opened this issue Nov 23, 2023 · 3 comments
Closed

Batching inference outputs are not the same with single inference #1761

gesanqiu opened this issue Nov 23, 2023 · 3 comments

Comments

@gesanqiu
Copy link
Contributor

gesanqiu commented Nov 23, 2023

If I call the llm.generate with a batch prompts and greedy search, the output of batch inference is different to single batch inference. Is this right? A minimum reprodece script looks like this:

from vllm import LLM, SamplingParams

# Sample prompts.
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0)

# Create an LLM.
llm = LLM(model="/workdir/hf_models/llama-2-7b-chat-hf/", trust_remote_code=True)
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
    
    
for prompt in prompts:
    print(prompt)
    outputs = llm.generate(prompt, sampling_params)
    for output in outputs:
        generated_text = output.outputs[0].text
        print(f"Generated text: {generated_text!r}")

image

@WoosukKwon
Copy link
Collaborator

Hi @gesanqiu, I believe this should not be the case. Could you share a reproducible example?

@gesanqiu gesanqiu reopened this Nov 24, 2023
@gesanqiu gesanqiu changed the title Batching inference outputs are not the same Batching inference outputs are not the same with single inference Nov 24, 2023
@gesanqiu
Copy link
Contributor Author

gesanqiu commented Nov 24, 2023

@WoosukKwon Sorry for made a mistake, this issue is tested on #1508, haven't reproduce this issue on vLLM release version yet. Need to check the dev branch.

@gesanqiu
Copy link
Contributor Author

#1546 fixed this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants