[Misc][Doc] Add note regarding loading `generation_config` by default #15281

ywang96 · 2025-03-21T08:29:36Z

#12622 introduced a change to apply generation_config, therefore can sometimes silently override default sampling parameters and cause user confusion. See #15241 as an example.

This PR updates the documentation regarding this change and gives a warning if any of the sampling parameters is overridden by the generation config.

Signed-off-by: Roger Wang <ywang@roblox.com>

github-actions · 2025-03-21T08:29:45Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Roger Wang <ywang@roblox.com>

ywang96 · 2025-03-21T08:44:35Z

Server log from launching vllm serve Qwen/Qwen2.5-VL-7B-Instruct on this branch. I think this should be enough notice for the end user.

INFO 03-21 08:44:05 [core.py:138] init engine (profile, create kv cache, warmup model) took 45.93 seconds
WARNING 03-21 08:44:05 [config.py:1028] Default sampling parameters have been overridden by the model's huggingface generation config. Please pass `--generation-config vllm` at server launch if this is not intended.
INFO 03-21 08:44:05 [serving_chat.py:115] Using default chat sampling params from model: {'repetition_penalty': 1.05, 'temperature': 0.1, 'top_k': 1, 'top_p': 0.001}
INFO 03-21 08:44:05 [serving_completion.py:61] Using default completion sampling params from model: {'repetition_penalty': 1.05, 'temperature': 0.1, 'top_k': 1, 'top_p': 0.001}
INFO 03-21 08:44:05 [api_server.py:1028] Starting vLLM API server on http://0.0.0.0:8000

hmellor

I'm not sure that this is what's happening, let's continue the discussion on #15241

DarkLight1337 · 2025-03-21T10:55:16Z

This LGTM but let's have @hmellor discuss with you first

hmellor · 2025-03-21T11:24:19Z

Ok I have figured out what wasn't right so we can bring the discussion back here.

#12622 changed the default behaviour of the config, so all entryoints had their default behaviour changed (not just online).

The behaviour observed in #15241 was because passing sampling parameters is handled differently in online and offline:

online - the server's default generation config updates individual parameters with ones passed by the user

offline - the LLM's default generation config is replaced with the SamplingParams passed by the user

vllm/vllm/entrypoints/llm.py

Lines 453 to 455 in 47c7126

    
           if sampling_params is None: 
        
               # Use default sampling params. 
        
               sampling_params = self.get_default_sampling_params()

This means that when you passed SamplingParams(temperature=1) to LLM.generate it was silently removing all the other default sampling params. This difference in behaviour is what caused the confusion.

docs/source/getting_started/quickstart.md

docs/source/serving/openai_compatible_server.md

hmellor · 2025-03-21T11:34:25Z

vllm/config.py

+        if diff_sampling_param:
+            logger.warning_once(
+                "Default sampling parameters have been overridden "
+                "by the model's huggingface generation config. "
+                "Please pass `--generation-config vllm` at server "
+                "launch if this is not intended.")


This warning combined with the info log from the server makes it abundently clear to the user where the generation config is coming from and what the values are.

Could we:

Verify that this warning appears when using the LLM class

Add a warning to LLM.generate when sampling_parameters is passed explaining that this overrides all the default sampling params in LLM not just the ones specified in the passed argument

Confirmed the warning appears when using LLM class.

For the warning, I think this is a bit tricky because if sampling_params is created from the way that we suggest in the example below, then it's not overriding all defaults

vllm/examples/offline_inference/basic/generate.py

Lines 18 to 26 in f68cce8

sampling_params = llm.get_default_sampling_params()

if max_tokens is not None:

sampling_params.max_tokens = max_tokens

if temperature is not None:

sampling_params.temperature = temperature

if top_p is not None:

sampling_params.top_p = top_p

if top_k is not None:

sampling_params.top_k = top_k

.

I think on this note, we should probably make llm.get_default_sampling_params() as the post_init of SamplingParams, but we can leave that to a separate discussion.

Signed-off-by: Roger Wang <ywang@roblox.com>

ywang96 · 2025-03-23T07:35:36Z

docs/source/models/generative_models.md

    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
 ```
+:::{important}
+By default, vLLM will use sampling parameters recommended by model creator by applying the `generation_config.json` from the huggingface model repository if it exists. In most cases, this will provide you with the best results by default if {class}`~vllm.SamplingParams` is not specified.


if {class}~vllm.SamplingParams is not specified.

SamplingParams is called out in particular here to differentiate this from online serving

hmellor

LGTM!

One small nit would be to change huggingface -> Hugging Face, also it looks like pre-commit is upset about some whitespace.

Signed-off-by: Roger Wang <ywang@roblox.com>

…vllm-project#15281) Signed-off-by: Roger Wang <ywang@roblox.com>

…vllm-project#15281) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: Wes Medford <wryanmedford@gmail.com>

…vllm-project#15281) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>

…vllm-project#15281) Signed-off-by: Roger Wang <ywang@roblox.com>

…vllm-project#15281) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>

ywang96 added 3 commits March 21, 2025 01:07

clarify

45c2cb5

Signed-off-by: Roger Wang <ywang@roblox.com>

add warning

ba0cae1

Signed-off-by: Roger Wang <ywang@roblox.com>

update

f094485

Signed-off-by: Roger Wang <ywang@roblox.com>

mergify bot added the documentation Improvements or additions to documentation label Mar 21, 2025

ywang96 requested a review from hmellor March 21, 2025 08:30

update

efa084c

Signed-off-by: Roger Wang <ywang@roblox.com>

hmellor requested changes Mar 21, 2025

View reviewed changes

DarkLight1337 approved these changes Mar 21, 2025

View reviewed changes

hmellor requested changes Mar 21, 2025

View reviewed changes

hiyouga mentioned this pull request Mar 21, 2025

vllm 0.8.0 will load default generation_config.json volcengine/verl#702

Open

ywang96 added 4 commits March 22, 2025 22:14

Merge branch 'main' into update-doc

79fab63

update

76bec38

Signed-off-by: Roger Wang <ywang@roblox.com>

update

14cb6cf

Signed-off-by: Roger Wang <ywang@roblox.com>

update

eadf202

Signed-off-by: Roger Wang <ywang@roblox.com>

ywang96 requested a review from hmellor March 23, 2025 07:03

ywang96 added 2 commits March 23, 2025 00:05

update

2805660

Signed-off-by: Roger Wang <ywang@roblox.com>

update

4079bb7

Signed-off-by: Roger Wang <ywang@roblox.com>

ywang96 requested a review from simon-mo March 23, 2025 07:30

ywang96 commented Mar 23, 2025

View reviewed changes

ywang96 changed the title ~~[Misc][Doc] Add note regarding generation_config for online serving~~ [Misc][Doc] Add note regarding loading generation_config by default Mar 23, 2025

hmellor approved these changes Mar 23, 2025

View reviewed changes

ywang96 added 2 commits March 23, 2025 12:16

rename huggingface

627146e

Signed-off-by: Roger Wang <ywang@roblox.com>

precommit fix

f359fad

Signed-off-by: Roger Wang <ywang@roblox.com>

ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 23, 2025

simon-mo merged commit 9c5c81b into main Mar 23, 2025
32 of 41 checks passed

simon-mo deleted the update-doc branch March 23, 2025 21:00

erictang000 pushed a commit to erictang000/vllm that referenced this pull request Mar 25, 2025

[Misc][Doc] Add note regarding loading generation_config by default (…

9d3d0c1

…vllm-project#15281) Signed-off-by: Roger Wang <ywang@roblox.com>

wrmedford pushed a commit to wrmedford/vllm that referenced this pull request Mar 26, 2025

[Misc][Doc] Add note regarding loading generation_config by default (…

8558d34

…vllm-project#15281) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: Wes Medford <wryanmedford@gmail.com>

lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025

[Misc][Doc] Add note regarding loading generation_config by default (…

9e02360

…vllm-project#15281) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025

[Misc][Doc] Add note regarding loading generation_config by default (…

475b1c9

…vllm-project#15281) Signed-off-by: Roger Wang <ywang@roblox.com>

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

[Misc][Doc] Add note regarding loading generation_config by default (…

e230df6

…vllm-project#15281) Signed-off-by: Roger Wang <ywang@roblox.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

[Misc][Doc] Add note regarding loading `generation_config` by default #15281

[Misc][Doc] Add note regarding loading `generation_config` by default #15281

Uh oh!

ywang96 commented Mar 21, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Mar 21, 2025

Uh oh!

ywang96 commented Mar 21, 2025

Uh oh!

hmellor left a comment

Uh oh!

DarkLight1337 commented Mar 21, 2025

Uh oh!

hmellor commented Mar 21, 2025

Uh oh!

Uh oh!

Uh oh!

hmellor Mar 21, 2025

Uh oh!

ywang96 Mar 23, 2025 •

edited

Loading

Uh oh!

ywang96 Mar 23, 2025

Uh oh!

hmellor left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	sampling_params = llm.get_default_sampling_params()
	if max_tokens is not None:
	sampling_params.max_tokens = max_tokens
	if temperature is not None:
	sampling_params.temperature = temperature
	if top_p is not None:
	sampling_params.top_p = top_p
	if top_k is not None:
	sampling_params.top_k = top_k

Uh oh!

Uh oh!

[Misc][Doc] Add note regarding loading generation_config by default #15281

[Misc][Doc] Add note regarding loading generation_config by default #15281

Uh oh!

Conversation

ywang96 commented Mar 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 21, 2025

Uh oh!

ywang96 commented Mar 21, 2025

Uh oh!

hmellor left a comment

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Mar 21, 2025

Uh oh!

hmellor commented Mar 21, 2025

Uh oh!

Uh oh!

Uh oh!

hmellor Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

ywang96 Mar 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ywang96 Mar 23, 2025

Choose a reason for hiding this comment

Uh oh!

hmellor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[Misc][Doc] Add note regarding loading `generation_config` by default #15281

[Misc][Doc] Add note regarding loading `generation_config` by default #15281

ywang96 commented Mar 21, 2025 •

edited by github-actions bot

Loading

ywang96 Mar 23, 2025 •

edited

Loading