-
-
Couldn't load subscription status.
- Fork 10.9k
[Misc][Doc] Add note regarding loading generation_config by default
#15281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
|
Server log from launching |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that this is what's happening, let's continue the discussion on #15241
|
This LGTM but let's have @hmellor discuss with you first |
|
Ok I have figured out what wasn't right so we can bring the discussion back here. #12622 changed the default behaviour of the config, so all entryoints had their default behaviour changed (not just online). The behaviour observed in #15241 was because passing sampling parameters is handled differently in online and offline:
This means that when you passed |
vllm/config.py
Outdated
| if diff_sampling_param: | ||
| logger.warning_once( | ||
| "Default sampling parameters have been overridden " | ||
| "by the model's huggingface generation config. " | ||
| "Please pass `--generation-config vllm` at server " | ||
| "launch if this is not intended.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This warning combined with the info log from the server makes it abundently clear to the user where the generation config is coming from and what the values are.
Could we:
- Verify that this warning appears when using the LLM class
- Add a warning to
LLM.generatewhensampling_parametersis passed explaining that this overrides all the default sampling params inLLMnot just the ones specified in the passed argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed the warning appears when using LLM class.
For the warning, I think this is a bit tricky because if sampling_params is created from the way that we suggest in the example below, then it's not overriding all defaults
vllm/examples/offline_inference/basic/generate.py
Lines 18 to 26 in f68cce8
| sampling_params = llm.get_default_sampling_params() | |
| if max_tokens is not None: | |
| sampling_params.max_tokens = max_tokens | |
| if temperature is not None: | |
| sampling_params.temperature = temperature | |
| if top_p is not None: | |
| sampling_params.top_p = top_p | |
| if top_k is not None: | |
| sampling_params.top_k = top_k |
I think on this note, we should probably make llm.get_default_sampling_params() as the post_init of SamplingParams, but we can leave that to a separate discussion.
| print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") | ||
| ``` | ||
| :::{important} | ||
| By default, vLLM will use sampling parameters recommended by model creator by applying the `generation_config.json` from the huggingface model repository if it exists. In most cases, this will provide you with the best results by default if {class}`~vllm.SamplingParams` is not specified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if {class}
~vllm.SamplingParamsis not specified.
SamplingParams is called out in particular here to differentiate this from online serving
generation_config for online servinggeneration_config by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
One small nit would be to change huggingface -> Hugging Face, also it looks like pre-commit is upset about some whitespace.
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
…vllm-project#15281) Signed-off-by: Roger Wang <ywang@roblox.com>
…vllm-project#15281) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: Wes Medford <wryanmedford@gmail.com>
…vllm-project#15281) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>
…vllm-project#15281) Signed-off-by: Roger Wang <ywang@roblox.com>
…vllm-project#15281) Signed-off-by: Roger Wang <ywang@roblox.com>
…vllm-project#15281) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
#12622 introduced a change to apply
generation_config, therefore can sometimes silently override default sampling parameters and cause user confusion. See #15241 as an example.This PR updates the documentation regarding this change and gives a warning if any of the sampling parameters is overridden by the generation config.