-
-
Notifications
You must be signed in to change notification settings - Fork 11k
[Frontend] Pass API server count to each process #23717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Frontend] Pass API server count to each process #23717
Conversation
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces the necessary changes to pass the API server count and rank to each server process. This is a crucial step for enabling more sophisticated caching and resource management in a scaled-out API server environment. The changes are well-implemented across the configuration, argument parsing, and process management layers. Key changes include adding api_process_count and api_process_rank to ParallelConfig, updating APIServerProcessManager to handle per-server arguments, and correctly disabling incompatible features like IPC cache when multiple API servers are active. The refactoring in EngineArgs.from_cli_args also improves robustness. Overall, this is a solid contribution that enhances the multi-process architecture of vLLM.
vllm/config/parallel.py
Outdated
| api_process_count: int = 1 | ||
| """[Internal] The number of API processes initialized.""" | ||
| api_process_rank: int = 0 | ||
| """[Internal] The rank of this API process.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this to handle internal state which needs a public interface?
https://docs.python.org/3/library/dataclasses.html#dataclasses.InitVar
| api_process_count: int = 1 | |
| """[Internal] The number of API processes initialized.""" | |
| api_process_rank: int = 0 | |
| """[Internal] The rank of this API process.""" | |
| _api_process_count: int | |
| api_process_count: InitVar[int] = 1 | |
| """The number of API processes initialized.""" | |
| _api_process_rank: int | |
| api_process_rank: InitVar[int] = 0 | |
| """The rank of this API process.""" | |
| ... | |
| def __post_init__(self, api_process_count, api_process_rank): | |
| ... | |
| self._api_process_count = api_process_count | |
| self._api_process_rank = api_process_rank | |
| ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still access those attributes as public attributes in our code. The "Internal" here refers to the fact that these are only supposed to be passed in CLI args via API server scale-out. Users should not set this flag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this works (although mypy will complain about redefinition):
# example.py
from dataclasses import InitVar, dataclass
@dataclass
class ParallelConfig:
api_process_count: InitVar[int] = 1
def __post_init__(self, api_process_count: int):
self._api_process_count = api_process_count
@property
def api_process_count(self) -> int:
return self._api_process_count
parallel_config = ParallelConfig(api_process_count=4)
print(parallel_config.api_process_count)
parallel_config.api_process_count = 2
$ python example.py
4
Traceback (most recent call last):
File "/home/harry/vllm/demo.py", line 18, in <module>
parallel_config.api_process_count = 2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: property 'api_process_count' of 'ParallelConfig' object has no setter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users should not set this flag
users can't set this flag already, probably worth it to make it read only as Harry suggested or in a similar fashion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this isn't worth the extra complexity, at least for now (especially since mypy doesn't even work with this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I was just trying to think of robust ways to have config that can be set on init but not later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid confusion, I have updated the docstring to be more clear that "internal" refers to how the CLI arg is passed, rather than its usage in the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think most config attributes (not just this one) shouldn't be modified after construction tbh. We should try to fix their values at initialization time but it would take some refactoring.
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me, thanks! Let's wait for tests
vllm/config/parallel.py
Outdated
| api_process_count: int = 1 | ||
| """[Internal] The number of API processes initialized.""" | ||
| api_process_rank: int = 0 | ||
| """[Internal] The rank of this API process.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users should not set this flag
users can't set this flag already, probably worth it to make it read only as Harry suggested or in a similar fashion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @DarkLight1337
I guess I've lost track of why we need the new parameters in the config for this? given we are already passing the count/index to these places (AFAICT)
| client_addresses=client_addresses, | ||
| ) | ||
|
|
||
| self.client_count = client_count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used anywhere? or it's added to be used in future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I just added this for consistency since client_index is being assigned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new parameters are put in the config because the config is readily accessible in various parts of vLLM, which is needed for the next PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah there's different instances of this behavior, I suppose at some point we could refactor this into a shared context (which isn't the forward one) to avoid abusing config.py changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DarkLight1337 sorry for taking so long, needed to find time to wrap my head around this properly.
Could we not just add the processor count and processor index as optional args to Processor __init__ and run_profile methods? Then the config changes shouldn't be needed.
| def _enable_ipc_cache(vllm_config: "VllmConfig") -> bool: | ||
| parallel_config = vllm_config.parallel_config | ||
| supports_ipc_cache = (parallel_config.data_parallel_size == 1 | ||
| supports_ipc_cache = ((parallel_config._api_process_count == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's address in the next PR if necessary. Merging this first to avoid having to keep updating the next PR
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: qqma <qqma@amazon.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: charlifu <charlifu@amd.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Purpose
Follow-up to #23018.
By passing API server count and rank instead of setting cache size to 0, this PR enables processor caching when API server scale-out is enabled. IPC caching is still disabled for internal LB though since there is no 1:1 relationship between API server and Engine Core processes.
Also these changes are required for #22070.
Test Plan
Should we add an endpoint to query the API server count and rank just to test that these arguments are passed correctly?Actually we already have/server_infofor that, going to add a test.cc @njhill
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.