-
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[platform] add base class for communicators #13208
Conversation
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
@@ -196,63 +191,36 @@ def __init__( | |||
assert self.device_group is not None | |||
|
|||
from vllm.platforms import current_platform | |||
|
|||
# TODO: fix it for other platforms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion but need your confirm as I'm not sure of the neuron and openvino devices:
if current_platform.device_type in ["neuron", "openvino"]:
self.device = torch.device("cpu")
elif not current_platform.is_tpu():
self.device = torch.device(
f"{current_platform.device_type}:{local_rank}")
else:
import torch_xla.core.xla_model as xm
self.device = xm.xla_device(local_rank)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this part of code is complicated to abstract, so i left it here. openvino
neuron
and tpu
do not use this self.device
field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a way to abstract this is let platforms determine whether they should use self.device
, like this. WDYT?
diff --git a/vllm/distributed/parallel_state.py b/vllm/distributed/parallel_state.py
index 4f13449f..3f77eb84 100644
--- a/vllm/distributed/parallel_state.py
+++ b/vllm/distributed/parallel_state.py
@@ -193,10 +193,10 @@ class GroupCoordinator:
from vllm.platforms import current_platform
# TODO: fix it for other platforms
- if current_platform.is_cuda_alike():
- self.device = torch.device(f"cuda:{local_rank}")
- else:
- self.device = torch.device("cpu")
+ # initialize to cpu as a placeholder
+ self.device = torch.device("cpu")
+ if current_platform.use_device_field():
+ self.device = torch.device(f"{current_platform.device_type}:{local_rank}")
self.use_device_communicator = use_device_communicator
diff --git a/vllm/platforms/interface.py b/vllm/platforms/interface.py
index 5411de3d..8475aeed 100644
--- a/vllm/platforms/interface.py
+++ b/vllm/platforms/interface.py
@@ -328,6 +328,13 @@ class Platform:
"""
return "vllm.distributed.device_communicator.base_device_communicator.DeviceCommunicatorBase" # noqa
+ @classmethod
+ def use_device_field(cls) -> bool:
+ """
+ Return a bool value indicates if the device field is used in current platform.
+ This is used in parallel state to infer the device of tensors in comm ops.
+ """
+ return True
class UnspecifiedPlatform(Platform):
_enum = PlatformEnum.UNSPECIFIED
diff --git a/vllm/platforms/neuron.py b/vllm/platforms/neuron.py
index 5a03f5f7..afd8dd08 100644
--- a/vllm/platforms/neuron.py
+++ b/vllm/platforms/neuron.py
@@ -55,3 +55,7 @@ class NeuronPlatform(Platform):
def is_pin_memory_available(cls) -> bool:
logger.warning("Pin memory is not supported on Neuron.")
return False
+
+ @classmethod
+ def use_device_field(cls) -> bool:
+ return False
\ No newline at end of file
diff --git a/vllm/platforms/openvino.py b/vllm/platforms/openvino.py
index 41221de0..d925ac20 100644
--- a/vllm/platforms/openvino.py
+++ b/vllm/platforms/openvino.py
@@ -150,3 +150,7 @@ class OpenVinoPlatform(Platform):
assert cls.is_openvino_cpu() or \
cls.is_openvino_gpu(), \
"OpenVINO backend supports only CPU and GPU devices"
+
+ @classmethod
+ def use_device_field(cls) -> bool:
+ return False
diff --git a/vllm/platforms/tpu.py b/vllm/platforms/tpu.py
index 771b2be5..b81434c1 100644
--- a/vllm/platforms/tpu.py
+++ b/vllm/platforms/tpu.py
@@ -97,3 +97,7 @@ class TpuPlatform(Platform):
@classmethod
def get_device_communicator_cls(cls) -> str:
return "vllm.distributed.device_communicators.tpu_communicator.TpuCommunicator" # noqa
+
+ @classmethod
+ def use_device_field(cls) -> bool:
+ return False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's only create abstractions if it is necessary. I feel it's unnecessary right now to introduce an abstraction only for this piece of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, got it
LGTM. CI failure looks related to hf problem |
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense and LGTM!
need a rebase |
failing tests come from main branch, merging |
### What this PR does / why we need it? Revert communicator patch as vllm-project/vllm#13208 has been merged. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? test locally by #30 (comment) Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
### What this PR does / why we need it? Mark v0.7.1 as unmaintained and v0.7.3 as maintained: vLLM released the v0.7.3 version: https://github.com/vllm-project/vllm/releases/tag/v0.7.3 which include serval commits: - vllm-project/vllm#12874 - vllm-project/vllm#12432 - vllm-project/vllm#13208 We'd better to bump the versions to v0.7.3. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Preview Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
To support out-of-tree platforms, we need to abstract the communicators, so that out-of-tree platforms can implement their own communicator and hook into vllm's code.