Skip to content

Conversation

@ywang96
Copy link
Member

@ywang96 ywang96 commented Mar 21, 2025

#12622 introduced a change to apply generation_config, therefore can sometimes silently override default sampling parameters and cause user confusion. See #15241 as an example.

This PR updates the documentation regarding this change and gives a warning if any of the sampling parameters is overridden by the generation config.

ywang96 added 3 commits March 21, 2025 01:07
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
@mergify mergify bot added the documentation Improvements or additions to documentation label Mar 21, 2025
@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@ywang96 ywang96 requested a review from hmellor March 21, 2025 08:30
Signed-off-by: Roger Wang <ywang@roblox.com>
@ywang96
Copy link
Member Author

ywang96 commented Mar 21, 2025

Server log from launching vllm serve Qwen/Qwen2.5-VL-7B-Instruct on this branch. I think this should be enough notice for the end user.

INFO 03-21 08:44:05 [core.py:138] init engine (profile, create kv cache, warmup model) took 45.93 seconds
WARNING 03-21 08:44:05 [config.py:1028] Default sampling parameters have been overridden by the model's huggingface generation config. Please pass `--generation-config vllm` at server launch if this is not intended.
INFO 03-21 08:44:05 [serving_chat.py:115] Using default chat sampling params from model: {'repetition_penalty': 1.05, 'temperature': 0.1, 'top_k': 1, 'top_p': 0.001}
INFO 03-21 08:44:05 [serving_completion.py:61] Using default completion sampling params from model: {'repetition_penalty': 1.05, 'temperature': 0.1, 'top_k': 1, 'top_p': 0.001}
INFO 03-21 08:44:05 [api_server.py:1028] Starting vLLM API server on http://0.0.0.0:8000

Copy link
Member

@hmellor hmellor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that this is what's happening, let's continue the discussion on #15241

@DarkLight1337
Copy link
Member

This LGTM but let's have @hmellor discuss with you first

@hmellor
Copy link
Member

hmellor commented Mar 21, 2025

Ok I have figured out what wasn't right so we can bring the discussion back here.

#12622 changed the default behaviour of the config, so all entryoints had their default behaviour changed (not just online).

The behaviour observed in #15241 was because passing sampling parameters is handled differently in online and offline:

  • online - the server's default generation config updates individual parameters with ones passed by the user
  • offline - the LLM's default generation config is replaced with the SamplingParams passed by the user
    if sampling_params is None:
    # Use default sampling params.
    sampling_params = self.get_default_sampling_params()

This means that when you passed SamplingParams(temperature=1) to LLM.generate it was silently removing all the other default sampling params. This difference in behaviour is what caused the confusion.

vllm/config.py Outdated
Comment on lines 1027 to 1032
if diff_sampling_param:
logger.warning_once(
"Default sampling parameters have been overridden "
"by the model's huggingface generation config. "
"Please pass `--generation-config vllm` at server "
"launch if this is not intended.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning combined with the info log from the server makes it abundently clear to the user where the generation config is coming from and what the values are.

Could we:

  • Verify that this warning appears when using the LLM class
  • Add a warning to LLM.generate when sampling_parameters is passed explaining that this overrides all the default sampling params in LLM not just the ones specified in the passed argument

Copy link
Member Author

@ywang96 ywang96 Mar 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed the warning appears when using LLM class.

For the warning, I think this is a bit tricky because if sampling_params is created from the way that we suggest in the example below, then it's not overriding all defaults

sampling_params = llm.get_default_sampling_params()
if max_tokens is not None:
sampling_params.max_tokens = max_tokens
if temperature is not None:
sampling_params.temperature = temperature
if top_p is not None:
sampling_params.top_p = top_p
if top_k is not None:
sampling_params.top_k = top_k
.

I think on this note, we should probably make llm.get_default_sampling_params() as the post_init of SamplingParams, but we can leave that to a separate discussion.

ywang96 added 4 commits March 22, 2025 22:14
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
@ywang96 ywang96 requested a review from hmellor March 23, 2025 07:03
ywang96 added 2 commits March 23, 2025 00:05
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
@ywang96 ywang96 requested a review from simon-mo March 23, 2025 07:30
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
```
:::{important}
By default, vLLM will use sampling parameters recommended by model creator by applying the `generation_config.json` from the huggingface model repository if it exists. In most cases, this will provide you with the best results by default if {class}`~vllm.SamplingParams` is not specified.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if {class}~vllm.SamplingParams is not specified.

SamplingParams is called out in particular here to differentiate this from online serving

@ywang96 ywang96 changed the title [Misc][Doc] Add note regarding generation_config for online serving [Misc][Doc] Add note regarding loading generation_config by default Mar 23, 2025
Copy link
Member

@hmellor hmellor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

One small nit would be to change huggingface -> Hugging Face, also it looks like pre-commit is upset about some whitespace.

ywang96 added 2 commits March 23, 2025 12:16
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
@ywang96 ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 23, 2025
@simon-mo simon-mo merged commit 9c5c81b into main Mar 23, 2025
32 of 41 checks passed
@simon-mo simon-mo deleted the update-doc branch March 23, 2025 21:00
erictang000 pushed a commit to erictang000/vllm that referenced this pull request Mar 25, 2025
wrmedford pushed a commit to wrmedford/vllm that referenced this pull request Mar 26, 2025
…vllm-project#15281)

Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: Wes Medford <wryanmedford@gmail.com>
lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025
…vllm-project#15281)

Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>
lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025
shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
…vllm-project#15281)

Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants