-
-
Notifications
You must be signed in to change notification settings - Fork 11k
Only log non-default CLI args for online serving #17803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
|
Can you add an example before and after for a command? |
|
GitHub adding the scrollbar doesn't really do it justice, but I've added it to the PR description. Long line become short line. |
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM this is exactly what I'd like to see, curious to get acceptance from others before landing
|
Personally, I don't think this PR is very good. Non-default parameters are provided by the user, so the user is obviously aware of them and there's no need to print them. On the other hand, default parameters are what the user might not know, and printing them could actually be more helpful. For example, parameters like swap_space=4, enable_prefix_caching=None, and prefix_caching_hash_algo='builtin' might give users a clearer understanding of whether the current vLLM configuration is reasonable. |
|
Thank you for raising your concern. However, I still think this is a positive change:
INFO 05-08 10:47:08 [__init__.py:248] Automatically detected platform cuda.
INFO 05-08 10:47:13 [api_server.py:1044] vLLM API server version 0.8.5.dev495+ge89b3f03a
INFO 05-08 10:47:14 [cli_args.py:297] non-default args: {'model': 'meta-llama/Llama-3.2-1B-Instruct', 'gpu_memory_utilization': 0.4}
INFO 05-08 10:47:23 [config.py:752] This model supports multiple tasks: {'embed', 'generate', 'score', 'classify', 'reward'}. Defaulting to 'generate'.
INFO 05-08 10:47:23 [config.py:2057] Chunked prefill is enabled with max_num_batched_tokens=8192.
INFO 05-08 10:47:28 [__init__.py:248] Automatically detected platform cuda.
INFO 05-08 10:47:32 [core.py:61] Initializing a V1 LLM engine (v0.8.5.dev495+ge89b3f03a) with config: model='meta-llama/Llama-3.2-1B-Instruct', speculative_config=None, tokenizer='meta-llama/Llama-3.2-1B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=None, served_model_name=meta-llama/Llama-3.2-1B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"custom_ops":["none"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"use_inductor":true,"compile_sizes":[],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":512}If you still believe that the full argparse namespace should be logged, we could add it back as a debug log? |
|
@hmellor Thank you for the clarification. It's great to be able to see all the parameters. |
I have no further comments. Being able to see all the parameters is sufficient. |
I didn't want to land this so hastily because it is a log that has been present for a long time. Please respect asks for getting sign off from other committers. |
|
Oh sorry I forgot to comment, this LGTM as well which is why I merged it. |
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Yuqi Zhang <yuqizhang@google.com>

Before:
After:
cc @mgoin