Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[platform] add base class for communicators #13208

Merged
merged 25 commits into from
Feb 16, 2025

Conversation

youkaichao
Copy link
Member

@youkaichao youkaichao commented Feb 13, 2025

To support out-of-tree platforms, we need to abstract the communicators, so that out-of-tree platforms can implement their own communicator and hook into vllm's code.

Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
@youkaichao youkaichao marked this pull request as ready for review February 13, 2025 08:39
@youkaichao youkaichao changed the title add base class for communicator [platform] add base class for communicators Feb 13, 2025
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
@@ -196,63 +191,36 @@ def __init__(
assert self.device_group is not None

from vllm.platforms import current_platform

# TODO: fix it for other platforms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion but need your confirm as I'm not sure of the neuron and openvino devices:

        if current_platform.device_type in ["neuron", "openvino"]:
            self.device = torch.device("cpu")
        elif not current_platform.is_tpu():
            self.device = torch.device(
                f"{current_platform.device_type}:{local_rank}")
        else:
            import torch_xla.core.xla_model as xm
            self.device = xm.xla_device(local_rank)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this part of code is complicated to abstract, so i left it here. openvino neuron and tpu do not use this self.device field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a way to abstract this is let platforms determine whether they should use self.device, like this. WDYT?

diff --git a/vllm/distributed/parallel_state.py b/vllm/distributed/parallel_state.py
index 4f13449f..3f77eb84 100644
--- a/vllm/distributed/parallel_state.py
+++ b/vllm/distributed/parallel_state.py
@@ -193,10 +193,10 @@ class GroupCoordinator:
         from vllm.platforms import current_platform
 
         # TODO: fix it for other platforms
-        if current_platform.is_cuda_alike():
-            self.device = torch.device(f"cuda:{local_rank}")
-        else:
-            self.device = torch.device("cpu")
+        # initialize to cpu as a placeholder
+        self.device = torch.device("cpu")
+        if current_platform.use_device_field():
+            self.device = torch.device(f"{current_platform.device_type}:{local_rank}")
 
         self.use_device_communicator = use_device_communicator
 
diff --git a/vllm/platforms/interface.py b/vllm/platforms/interface.py
index 5411de3d..8475aeed 100644
--- a/vllm/platforms/interface.py
+++ b/vllm/platforms/interface.py
@@ -328,6 +328,13 @@ class Platform:
         """
         return "vllm.distributed.device_communicator.base_device_communicator.DeviceCommunicatorBase"  # noqa
 
+    @classmethod
+    def use_device_field(cls) -> bool:
+        """
+        Return a bool value indicates if the device field is used in current platform.
+        This is used in parallel state to infer the device of tensors in comm ops.
+        """
+        return True
 
 class UnspecifiedPlatform(Platform):
     _enum = PlatformEnum.UNSPECIFIED
diff --git a/vllm/platforms/neuron.py b/vllm/platforms/neuron.py
index 5a03f5f7..afd8dd08 100644
--- a/vllm/platforms/neuron.py
+++ b/vllm/platforms/neuron.py
@@ -55,3 +55,7 @@ class NeuronPlatform(Platform):
     def is_pin_memory_available(cls) -> bool:
         logger.warning("Pin memory is not supported on Neuron.")
         return False
+
+    @classmethod
+    def use_device_field(cls) -> bool:
+        return False
\ No newline at end of file
diff --git a/vllm/platforms/openvino.py b/vllm/platforms/openvino.py
index 41221de0..d925ac20 100644
--- a/vllm/platforms/openvino.py
+++ b/vllm/platforms/openvino.py
@@ -150,3 +150,7 @@ class OpenVinoPlatform(Platform):
         assert cls.is_openvino_cpu() or \
             cls.is_openvino_gpu(), \
             "OpenVINO backend supports only CPU and GPU devices"
+
+    @classmethod
+    def use_device_field(cls) -> bool:
+        return False
diff --git a/vllm/platforms/tpu.py b/vllm/platforms/tpu.py
index 771b2be5..b81434c1 100644
--- a/vllm/platforms/tpu.py
+++ b/vllm/platforms/tpu.py
@@ -97,3 +97,7 @@ class TpuPlatform(Platform):
     @classmethod
     def get_device_communicator_cls(cls) -> str:
         return "vllm.distributed.device_communicators.tpu_communicator.TpuCommunicator"  # noqa
+
+    @classmethod
+    def use_device_field(cls) -> bool:
+        return False

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's only create abstractions if it is necessary. I feel it's unnecessary right now to introduce an abstraction only for this piece of code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, got it

Signed-off-by: youkaichao <youkaichao@gmail.com>
@wangxiyuan
Copy link
Contributor

LGTM. CI failure looks related to hf problem

@njhill njhill self-requested a review February 14, 2025 17:41
Copy link

mergify bot commented Feb 14, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @youkaichao.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Feb 14, 2025
Copy link
Collaborator

@tlrmchlsmth tlrmchlsmth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense and LGTM!

@njhill njhill removed their request for review February 16, 2025 00:13
@wangxiyuan
Copy link
Contributor

need a rebase

@mergify mergify bot removed the needs-rebase label Feb 16, 2025
@youkaichao youkaichao enabled auto-merge (squash) February 16, 2025 12:10
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 16, 2025
@youkaichao
Copy link
Member Author

failing tests come from main branch, merging

@youkaichao youkaichao disabled auto-merge February 16, 2025 14:14
@youkaichao youkaichao merged commit a0231b7 into vllm-project:main Feb 16, 2025
57 of 62 checks passed
@youkaichao youkaichao deleted the communicator_base branch February 16, 2025 14:14
@wangxiyuan wangxiyuan mentioned this pull request Feb 17, 2025
1 task
wangxiyuan pushed a commit to vllm-project/vllm-ascend that referenced this pull request Feb 17, 2025
### What this PR does / why we need it?
Revert communicator patch as
vllm-project/vllm#13208 has been merged.

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
test locally by
#30 (comment)

Signed-off-by: MengqingCao <cmq0113@163.com>
panf2333 pushed a commit to yottalabsai/vllm that referenced this pull request Feb 18, 2025
Signed-off-by: youkaichao <youkaichao@gmail.com>
xjpang pushed a commit to xjpang/vllm that referenced this pull request Feb 20, 2025
Signed-off-by: youkaichao <youkaichao@gmail.com>
kerthcet pushed a commit to kerthcet/vllm that referenced this pull request Feb 21, 2025
Signed-off-by: youkaichao <youkaichao@gmail.com>
wangxiyuan pushed a commit to vllm-project/vllm-ascend that referenced this pull request Feb 21, 2025
### What this PR does / why we need it?
Mark v0.7.1 as unmaintained and v0.7.3 as maintained:
vLLM released the v0.7.3 version:
https://github.com/vllm-project/vllm/releases/tag/v0.7.3 which include
serval commits:
- vllm-project/vllm#12874
- vllm-project/vllm#12432
- vllm-project/vllm#13208

We'd better to bump the versions to v0.7.3.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Preview

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants