Skip to content

Conversation

@chaunceyjiang
Copy link
Collaborator

@chaunceyjiang chaunceyjiang commented Aug 5, 2025

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Fixes #21954

Purpose

vllm serve /home/jovyan/qwen3-8b  --data-parallel-size 2 --data-parallel-rpc-port 25555 --data-parallel-address 127.0.0.1 --api-server-count 2
INFO 09-05 01:29:36 [__init__.py:241] Automatically detected platform cuda.
INFO 09-05 01:29:41 [api_server.py:1894] vLLM API server version 0.10.2.dev403+g14b4326b9
INFO 09-05 01:29:41 [utils.py:328] non-default args: {'model_tag': '/home/jovyan/qwen3-8b', 'api_server_count': 2, 'model': '/home/jovyan/qwen3-8b', 'data_parallel_size': 2, 'data_parallel_address': '127.0.0.1', 'data_parallel_rpc_port': 25555, 'mm_processor_cache_gb': 0}
INFO 09-05 01:29:51 [__init__.py:748] Resolved architecture: Qwen3ForCausalLM
`torch_dtype` is deprecated! Use `dtype` instead!
....
(ApiServer_1 pid=21224) WARNING 09-05 01:30:13 [async_llm.py:108] AsyncLLM created with api_server_count more than 1; disabling stats logging to avoid incomplete stats.
(ApiServer_0 pid=21221) INFO 09-05 01:30:13 [scheduler.py:222] Chunked prefill is enabled with max_num_batched_tokens=8192.
(ApiServer_0 pid=21221) WARNING 09-05 01:30:13 [async_llm.py:108] AsyncLLM created with api_server_count more than 1; disabling stats logging to avoid incomplete stats.

Test Result

(Optional) Documentation Update

@chaunceyjiang chaunceyjiang requested a review from aarnphm as a code owner August 5, 2025 03:07
@mergify mergify bot added the frontend label Aug 5, 2025
@DarkLight1337 DarkLight1337 requested a review from njhill August 5, 2025 03:07
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a bugfix to disable the statistics logger when api_server_count is greater than 1, as this configuration is not compatible. The change correctly saves the original state of the disable_log_stats argument, forces it to True when multiple API servers are used, and issues a warning to the user if the feature was previously enabled. The implementation is consistent with how other incompatible features are handled in this scenario. The code is correct and effectively addresses the issue.

@chaunceyjiang
Copy link
Collaborator Author

Hi, @njhill I have a question: when disable_log_stats is set to true, both LoggingStatLogger and PrometheusStatLogger are disabled. Is this the expected behavior?

# Metric Logging.
if self.log_stats:
if stat_loggers is not None:
self.stat_loggers = stat_loggers
else:
# Lazy import for prometheus multiprocessing.
# We need to set PROMETHEUS_MULTIPROC_DIR environment variable
# before prometheus_client is imported.
# See https://prometheus.github.io/client_python/multiprocess/
from vllm.engine.metrics import (LoggingStatLogger,
PrometheusStatLogger)
self.stat_loggers = {
"logging":
LoggingStatLogger(
local_interval=_LOCAL_LOGGING_INTERVAL_SEC,
vllm_config=vllm_config),
"prometheus":
PrometheusStatLogger(
local_interval=_LOCAL_LOGGING_INTERVAL_SEC,
labels=dict(
model_name=self.model_config.served_model_name),
vllm_config=vllm_config),
}
self.stat_loggers["prometheus"].info("cache_config",
self.cache_config)

@github-actions
Copy link

github-actions bot commented Aug 5, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@njhill
Copy link
Member

njhill commented Aug 5, 2025

Hi, @njhill I have a question: when disable_log_stats is set to true, both LoggingStatLogger and PrometheusStatLogger are disabled. Is this the expected behavior?

@chaunceyjiang yes that's expected. Here we only want to omit LoggingStatLogger.

The API here is kind of crappy in general - if you provide custom stats loggers then they will replace the built-in ones rather than augment them. There's a separate discussion / PR proposal around that I think.

@chaunceyjiang
Copy link
Collaborator Author

/cc @njhill PTAL.

@njhill
Copy link
Member

njhill commented Sep 4, 2025

@chaunceyjiang I wonder if you could rebase this now that #20952 is merged?

The behaviour now is that custom stats loggers will augment the built-in ones unless disable_log_stats=True.

@chaunceyjiang chaunceyjiang force-pushed the disable_log_stat branch 5 times, most recently from a42c8a7 to 1529366 Compare September 5, 2025 01:20
…han 1

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
@chaunceyjiang
Copy link
Collaborator Author

@njhill PTAL.

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chaunceyjiang.

Actually we still want the prometheus metrics in this case and I think this change will disable those too.

I think the check should go here instead:

if enable_default_loggers and logger.isEnabledFor(logging.INFO):
factories.append(LoggingStatLogger)

…han 1

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
…han 1

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
…han 1

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
@chaunceyjiang chaunceyjiang requested a review from njhill September 8, 2025 09:54
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 8, 2025
@njhill
Copy link
Member

njhill commented Sep 8, 2025

Remaining test failure is also occurring on main.

@simon-mo simon-mo merged commit e680723 into vllm-project:main Sep 8, 2025
44 of 46 checks passed
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
…han 1 (vllm-project#22227)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025
…han 1 (vllm-project#22227)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
…han 1 (vllm-project#22227)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
…han 1 (vllm-project#22227)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…han 1 (vllm-project#22227)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Console stats logging is incorrect when using api-server scaleout

3 participants