You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If I call the llm.generate with a batch prompts and greedy search, the output of batch inference is different to single batch inference. Is this right? A minimum reprodece script looks like this:
fromvllmimportLLM, SamplingParams# Sample prompts.prompts= [
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is",
]
# Create a sampling params object.sampling_params=SamplingParams(temperature=0)
# Create an LLM.llm=LLM(model="/workdir/hf_models/llama-2-7b-chat-hf/", trust_remote_code=True)
# Generate texts from the prompts. The output is a list of RequestOutput objects# that contain the prompt, generated text, and other information.outputs=llm.generate(prompts, sampling_params)
# Print the outputs.foroutputinoutputs:
prompt=output.promptgenerated_text=output.outputs[0].textprint(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
forpromptinprompts:
print(prompt)
outputs=llm.generate(prompt, sampling_params)
foroutputinoutputs:
generated_text=output.outputs[0].textprint(f"Generated text: {generated_text!r}")
The text was updated successfully, but these errors were encountered:
@WoosukKwon Sorry for made a mistake, this issue is tested on #1508, haven't reproduce this issue on vLLM release version yet. Need to check the dev branch.
If I call the llm.generate with a batch prompts and greedy search, the output of batch inference is different to single batch inference. Is this right? A minimum reprodece script looks like this:
The text was updated successfully, but these errors were encountered: