[Frontend] Pass API server count to each process #23717

DarkLight1337 · 2025-08-27T06:55:50Z

Purpose

Follow-up to #23018.

By passing API server count and rank instead of setting cache size to 0, this PR enables processor caching when API server scale-out is enabled. IPC caching is still disabled for internal LB though since there is no 1:1 relationship between API server and Engine Core processes.

Also these changes are required for #22070.

Test Plan

~~Should we add an endpoint to query the API server count and rank just to test that these arguments are passed correctly?~~ Actually we already have /server_info for that, going to add a test.

cc @njhill

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

gemini-code-assist

Code Review

This pull request introduces the necessary changes to pass the API server count and rank to each server process. This is a crucial step for enabling more sophisticated caching and resource management in a scaled-out API server environment. The changes are well-implemented across the configuration, argument parsing, and process management layers. Key changes include adding api_process_count and api_process_rank to ParallelConfig, updating APIServerProcessManager to handle per-server arguments, and correctly disabling incompatible features like IPC cache when multiple API servers are active. The refactoring in EngineArgs.from_cli_args also improves robustness. Overall, this is a solid contribution that enhances the multi-process architecture of vLLM.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

hmellor · 2025-08-27T07:51:07Z

vllm/config/parallel.py

+    api_process_count: int = 1
+    """[Internal] The number of API processes initialized."""
+    api_process_rank: int = 0
+    """[Internal] The rank of this API process."""


Something like this to handle internal state which needs a public interface?

https://docs.python.org/3/library/dataclasses.html#dataclasses.InitVar

Suggested change

api_process_count: int = 1

"""[Internal] The number of API processes initialized."""

api_process_rank: int = 0

"""[Internal] The rank of this API process."""

_api_process_count: int

api_process_count: InitVar[int] = 1

"""The number of API processes initialized."""

_api_process_rank: int

api_process_rank: InitVar[int] = 0

"""The rank of this API process."""

...

def __post_init__(self, api_process_count, api_process_rank):

...

self._api_process_count = api_process_count

self._api_process_rank = api_process_rank

...

We still access those attributes as public attributes in our code. The "Internal" here refers to the fact that these are only supposed to be passed in CLI args via API server scale-out. Users should not set this flag

Something like this works (although mypy will complain about redefinition):

# example.py from dataclasses import InitVar, dataclass @dataclass class ParallelConfig: api_process_count: InitVar[int] = 1 def __post_init__(self, api_process_count: int): self._api_process_count = api_process_count @property def api_process_count(self) -> int: return self._api_process_count parallel_config = ParallelConfig(api_process_count=4) print(parallel_config.api_process_count) parallel_config.api_process_count = 2

$ python example.py 4 Traceback (most recent call last): File "/home/harry/vllm/demo.py", line 18, in <module> parallel_config.api_process_count = 2 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: property 'api_process_count' of 'ParallelConfig' object has no setter

Users should not set this flag

users can't set this flag already, probably worth it to make it read only as Harry suggested or in a similar fashion

I think this isn't worth the extra complexity, at least for now (especially since mypy doesn't even work with this)

Ok, I was just trying to think of robust ways to have config that can be set on init but not later

To avoid confusion, I have updated the docstring to be more clear that "internal" refers to how the CLI arg is passed, rather than its usage in the code

I think most config attributes (not just this one) shouldn't be modified after construction tbh. We should try to fix their values at initialization time but it would take some refactoring.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

NickLucche

This makes sense to me, thanks! Let's wait for tests

NickLucche · 2025-08-27T08:15:41Z

vllm/config/parallel.py

+    api_process_count: int = 1
+    """[Internal] The number of API processes initialized."""
+    api_process_rank: int = 0
+    """[Internal] The rank of this API process."""


Users should not set this flag

users can't set this flag already, probably worth it to make it read only as Harry suggested or in a similar fashion

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

njhill

Thanks @DarkLight1337

I guess I've lost track of why we need the new parameters in the config for this? given we are already passing the count/index to these places (AFAICT)

njhill · 2025-08-27T16:31:35Z

vllm/v1/engine/core_client.py

            client_addresses=client_addresses,
        )

+        self.client_count = client_count


Is this used anywhere? or it's added to be used in future?

No, I just added this for consistency since client_index is being assigned

The new parameters are put in the config because the config is readily accessible in various parts of vLLM, which is needed for the next PR

yeah there's different instances of this behavior, I suppose at some point we could refactor this into a shared context (which isn't the forward one) to avoid abusing config.py changes.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

vllm/engine/arg_utils.py

njhill

@DarkLight1337 sorry for taking so long, needed to find time to wrap my head around this properly.

Could we not just add the processor count and processor index as optional args to Processor __init__ and run_profile methods? Then the config changes shouldn't be needed.

DarkLight1337 · 2025-09-08T02:37:09Z

vllm/multimodal/cache.py

 def _enable_ipc_cache(vllm_config: "VllmConfig") -> bool:
    parallel_config = vllm_config.parallel_config
-    supports_ipc_cache = (parallel_config.data_parallel_size == 1
+    supports_ipc_cache = ((parallel_config._api_process_count == 1


@njhill it is also used here. So it needs to be passed to LLMEngine, AsyncLLM and EngineCore (and also Worker after #20452). I think for the sake of reducing the number of arguments that need to be piped, it's better to just include it in the config.

Let's address in the next PR if necessary. Merging this first to avoid having to keep updating the next PR

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: qqma <qqma@amazon.com>

…t#23717)" This reverts commit 6e64b12. Signed-off-by: qqma <qqma@amazon.com>

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: charlifu <charlifu@amd.com>

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

[Frontend] Pass API server count to each process

d52aa96

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 27, 2025

DarkLight1337 requested review from ProExpertProg, WoosukKwon, aarnphm, alexm-redhat, comaniac, hmellor, houseroad, mgoin, njhill, robertgshaw2-redhat, simon-mo, tlrmchlsmth, yewentao256, youkaichao and ywang96 as code owners August 27, 2025 06:55

mergify bot added documentation Improvements or additions to documentation frontend multi-modality Related to multi-modality (#4194) v1 labels Aug 27, 2025

gemini-code-assist bot reviewed Aug 27, 2025

View reviewed changes

Tests

5ff210d

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 removed the ready ONLY add when PR is ready to merge/full CI is needed label Aug 27, 2025

Update

ed76170

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

hmellor reviewed Aug 27, 2025

View reviewed changes

DarkLight1337 added 2 commits August 27, 2025 08:11

Update and fix tests

90703bd

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Update docstring

3f97be4

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

NickLucche approved these changes Aug 27, 2025

View reviewed changes

Optimize

91ea959

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 added 3 commits August 27, 2025 16:25

Try deepcopy

e08e7b7

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

No print

875c7e3

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Simplify

d9a5c81

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

njhill reviewed Aug 27, 2025

View reviewed changes

DarkLight1337 added 3 commits August 27, 2025 16:39

Fix

dabe421

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Update

fdc9b6e

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Type checking

94ec51d

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 mentioned this pull request Aug 27, 2025

[Core] Enable HF processing on GPU #22070

Draft

4 tasks

Merge branch 'main' into api-server-count-cli

3b5db2d

hmellor reviewed Sep 1, 2025

View reviewed changes

vllm/engine/arg_utils.py Show resolved Hide resolved

njhill reviewed Sep 7, 2025

View reviewed changes

DarkLight1337 commented Sep 8, 2025

View reviewed changes

DarkLight1337 added 2 commits September 8, 2025 02:40

Merge branch 'main' into 'api-server-count-cli'

6cb2566

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Merge branch 'main' into api-server-count-cli

6aa51a2

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 requested a review from chaunceyjiang as a code owner September 16, 2025 13:42

mergify bot added the performance Performance-related issues label Sep 16, 2025

Merge branch 'main' into api-server-count-cli

c29495d

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 merged commit 6c117cf into vllm-project:main Sep 19, 2025
48 checks passed

DarkLight1337 deleted the api-server-count-cli branch September 19, 2025 17:15

debroy-rh pushed a commit to debroy-rh/vllm that referenced this pull request Sep 19, 2025

[Frontend] Pass API server count to each process (vllm-project#23717)

7c12c49

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Daisy-Ma-coder pushed a commit to Daisy-Ma-coder/vllm that referenced this pull request Sep 20, 2025

[Frontend] Pass API server count to each process (vllm-project#23717)

6e64b12

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: qqma <qqma@amazon.com>

Daisy-Ma-coder pushed a commit to Daisy-Ma-coder/vllm that referenced this pull request Sep 20, 2025

Revert "[Frontend] Pass API server count to each process (vllm-projec…

7a4b528

…t#23717)" This reverts commit 6e64b12. Signed-off-by: qqma <qqma@amazon.com>

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Frontend] Pass API server count to each process (vllm-project#23717)

34b260b

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025

[Frontend] Pass API server count to each process (vllm-project#23717)

0ef2db8

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: charlifu <charlifu@amd.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025

[Frontend] Pass API server count to each process (vllm-project#23717)

f8c9748

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025

[Frontend] Pass API server count to each process (vllm-project#23717)

effef07

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[Frontend] Pass API server count to each process (vllm-project#23717)

f52f1b7

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[Frontend] Pass API server count to each process (vllm-project#23717)

3360b8f

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

-    api_process_count: int = 1
-    """[Internal] The number of API processes initialized."""
-    api_process_rank: int = 0
-    """[Internal] The rank of this API process."""
+    _api_process_count: int
+    api_process_count: InitVar[int] = 1
+    """The number of API processes initialized."""
+    _api_process_rank: int
+    api_process_rank: InitVar[int] = 0
+    """The rank of this API process."""
+    ...
+    def __post_init__(self, api_process_count, api_process_rank):
+        ...
+        self._api_process_count = api_process_count
+        self._api_process_rank = api_process_rank
+        ...

Uh oh!

[Frontend] Pass API server count to each process #23717

[Frontend] Pass API server count to each process #23717

Uh oh!

Conversation

DarkLight1337 commented Aug 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

DarkLight1337 commented Aug 27, 2025 •

edited by github-actions bot

Loading

DarkLight1337 Aug 27, 2025 •

edited

Loading

DarkLight1337 Aug 27, 2025 •

edited

Loading

DarkLight1337 Aug 27, 2025 •

edited

Loading

DarkLight1337 Aug 27, 2025 •

edited

Loading

DarkLight1337 Sep 8, 2025 •

edited

Loading