[platform] add base class for communicators #13208

youkaichao · 2025-02-13T07:25:58Z

To support out-of-tree platforms, we need to abstract the communicators, so that out-of-tree platforms can implement their own communicator and hook into vllm's code.

Signed-off-by: youkaichao <youkaichao@gmail.com>

github-actions · 2025-02-13T07:26:09Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: youkaichao <youkaichao@gmail.com>

MengqingCao · 2025-02-13T09:39:21Z

vllm/distributed/parallel_state.py

@@ -196,63 +191,36 @@ def __init__(
        assert self.device_group is not None

        from vllm.platforms import current_platform
+
+        # TODO: fix it for other platforms


suggestion but need your confirm as I'm not sure of the neuron and openvino devices:

if current_platform.device_type in ["neuron", "openvino"]: self.device = torch.device("cpu") elif not current_platform.is_tpu(): self.device = torch.device( f"{current_platform.device_type}:{local_rank}") else: import torch_xla.core.xla_model as xm self.device = xm.xla_device(local_rank)

I think this part of code is complicated to abstract, so i left it here. openvino neuron and tpu do not use this self.device field.

I think a way to abstract this is let platforms determine whether they should use self.device, like this. WDYT?

diff --git a/vllm/distributed/parallel_state.py b/vllm/distributed/parallel_state.py index 4f13449f..3f77eb84 100644 --- a/vllm/distributed/parallel_state.py +++ b/vllm/distributed/parallel_state.py @@ -193,10 +193,10 @@ class GroupCoordinator: from vllm.platforms import current_platform # TODO: fix it for other platforms - if current_platform.is_cuda_alike(): - self.device = torch.device(f"cuda:{local_rank}") - else: - self.device = torch.device("cpu") + # initialize to cpu as a placeholder + self.device = torch.device("cpu") + if current_platform.use_device_field(): + self.device = torch.device(f"{current_platform.device_type}:{local_rank}") self.use_device_communicator = use_device_communicator diff --git a/vllm/platforms/interface.py b/vllm/platforms/interface.py index 5411de3d..8475aeed 100644 --- a/vllm/platforms/interface.py +++ b/vllm/platforms/interface.py @@ -328,6 +328,13 @@ class Platform: """ return "vllm.distributed.device_communicator.base_device_communicator.DeviceCommunicatorBase" # noqa + @classmethod + def use_device_field(cls) -> bool: + """ + Return a bool value indicates if the device field is used in current platform. + This is used in parallel state to infer the device of tensors in comm ops. + """ + return True class UnspecifiedPlatform(Platform): _enum = PlatformEnum.UNSPECIFIED diff --git a/vllm/platforms/neuron.py b/vllm/platforms/neuron.py index 5a03f5f7..afd8dd08 100644 --- a/vllm/platforms/neuron.py +++ b/vllm/platforms/neuron.py @@ -55,3 +55,7 @@ class NeuronPlatform(Platform): def is_pin_memory_available(cls) -> bool: logger.warning("Pin memory is not supported on Neuron.") return False + + @classmethod + def use_device_field(cls) -> bool: + return False \ No newline at end of file diff --git a/vllm/platforms/openvino.py b/vllm/platforms/openvino.py index 41221de0..d925ac20 100644 --- a/vllm/platforms/openvino.py +++ b/vllm/platforms/openvino.py @@ -150,3 +150,7 @@ class OpenVinoPlatform(Platform): assert cls.is_openvino_cpu() or \ cls.is_openvino_gpu(), \ "OpenVINO backend supports only CPU and GPU devices" + + @classmethod + def use_device_field(cls) -> bool: + return False diff --git a/vllm/platforms/tpu.py b/vllm/platforms/tpu.py index 771b2be5..b81434c1 100644 --- a/vllm/platforms/tpu.py +++ b/vllm/platforms/tpu.py @@ -97,3 +97,7 @@ class TpuPlatform(Platform): @classmethod def get_device_communicator_cls(cls) -> str: return "vllm.distributed.device_communicators.tpu_communicator.TpuCommunicator" # noqa + + @classmethod + def use_device_field(cls) -> bool: + return False

let's only create abstractions if it is necessary. I feel it's unnecessary right now to introduce an abstraction only for this piece of code.

Okay, got it

Signed-off-by: youkaichao <youkaichao@gmail.com>

wangxiyuan · 2025-02-14T01:32:22Z

LGTM. CI failure looks related to hf problem

mergify · 2025-02-14T17:42:36Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @youkaichao.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

tlrmchlsmth

Makes sense and LGTM!

wangxiyuan · 2025-02-16T05:49:15Z

need a rebase

youkaichao · 2025-02-16T14:14:10Z

failing tests come from main branch, merging

### What this PR does / why we need it? Revert communicator patch as vllm-project/vllm#13208 has been merged. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? test locally by #30 (comment) Signed-off-by: MengqingCao <cmq0113@163.com>

Signed-off-by: youkaichao <youkaichao@gmail.com>

### What this PR does / why we need it? Mark v0.7.1 as unmaintained and v0.7.3 as maintained: vLLM released the v0.7.3 version: https://github.com/vllm-project/vllm/releases/tag/v0.7.3 which include serval commits: - vllm-project/vllm#12874 - vllm-project/vllm#12432 - vllm-project/vllm#13208 We'd better to bump the versions to v0.7.3. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Preview Signed-off-by: Yikun Jiang <yikunkero@gmail.com>

youkaichao added 8 commits February 13, 2025 14:20

add base

7828d17

update hpu

bc9e5d1

Signed-off-by: youkaichao <youkaichao@gmail.com>

update hpu

ba9ee4c

Signed-off-by: youkaichao <youkaichao@gmail.com>

update hpu

ff4c0a2

Signed-off-by: youkaichao <youkaichao@gmail.com>

update

274acf0

Signed-off-by: youkaichao <youkaichao@gmail.com>

tpu

cfe51f4

Signed-off-by: youkaichao <youkaichao@gmail.com>

xpu

07bd0a9

Signed-off-by: youkaichao <youkaichao@gmail.com>

draft

e6c1a32

Signed-off-by: youkaichao <youkaichao@gmail.com>

youkaichao added 13 commits February 13, 2025 15:34

revert

b7be82e

add files

12f20f6

add class name

7b6f8c7

Signed-off-by: youkaichao <youkaichao@gmail.com>

simplify cpu

72c25af

Signed-off-by: youkaichao <youkaichao@gmail.com>

fix send/recv

6a4f899

Signed-off-by: youkaichao <youkaichao@gmail.com>

add device

45b6c29

Signed-off-by: youkaichao <youkaichao@gmail.com>

simplify

e393b38

Signed-off-by: youkaichao <youkaichao@gmail.com>

simplify

b1e9813

Signed-off-by: youkaichao <youkaichao@gmail.com>

simplify

b344276

Signed-off-by: youkaichao <youkaichao@gmail.com>

fix types

04bcd81

Signed-off-by: youkaichao <youkaichao@gmail.com>

fix types

d5cc3db

Signed-off-by: youkaichao <youkaichao@gmail.com>

fix types

189afc9

Signed-off-by: youkaichao <youkaichao@gmail.com>

fix types

55f9840

Signed-off-by: youkaichao <youkaichao@gmail.com>

youkaichao marked this pull request as ready for review February 13, 2025 08:39

youkaichao changed the title ~~add base class for communicator~~ [platform] add base class for communicators Feb 13, 2025

youkaichao added 2 commits February 13, 2025 16:46

fix graph capture

95b7334

Signed-off-by: youkaichao <youkaichao@gmail.com>

fix graph capture

f38fd3c

Signed-off-by: youkaichao <youkaichao@gmail.com>

MengqingCao reviewed Feb 13, 2025

View reviewed changes

fix

57614e7

Signed-off-by: youkaichao <youkaichao@gmail.com>

njhill self-requested a review February 14, 2025 17:41

mergify bot added the needs-rebase label Feb 14, 2025

tlrmchlsmth approved these changes Feb 15, 2025

View reviewed changes

njhill removed their request for review February 16, 2025 00:13

Merge branch 'main' into communicator_base

63de35d

mergify bot removed the needs-rebase label Feb 16, 2025

youkaichao enabled auto-merge (squash) February 16, 2025 12:10

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 16, 2025

youkaichao disabled auto-merge February 16, 2025 14:14

youkaichao merged commit a0231b7 into vllm-project:main Feb 16, 2025
57 of 62 checks passed

youkaichao deleted the communicator_base branch February 16, 2025 14:14

wangxiyuan mentioned this pull request Feb 17, 2025

[RFC]: Hardware pluggable #11162

Open

1 task

MengqingCao mentioned this pull request Feb 17, 2025

[dist] revert communicator patch vllm-project/vllm-ascend#66

Merged

yma11 mentioned this pull request Feb 17, 2025

[Bugfix] fix xpu communicator #13368

Merged

panf2333 pushed a commit to yottalabsai/vllm that referenced this pull request Feb 18, 2025

[platform] add base class for communicators (vllm-project#13208)

56401ce

Signed-off-by: youkaichao <youkaichao@gmail.com>

xjpang pushed a commit to xjpang/vllm that referenced this pull request Feb 20, 2025

[platform] add base class for communicators (vllm-project#13208)

2caf3fe

Signed-off-by: youkaichao <youkaichao@gmail.com>

kerthcet pushed a commit to kerthcet/vllm that referenced this pull request Feb 21, 2025

[platform] add base class for communicators (vllm-project#13208)

9e0a800

Signed-off-by: youkaichao <youkaichao@gmail.com>

Yikun mentioned this pull request Feb 21, 2025

Mark v0.7.1 as unmaintained and v0.7.3 as maintained vllm-project/vllm-ascend#139

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[platform] add base class for communicators #13208

[platform] add base class for communicators #13208

youkaichao commented Feb 13, 2025 •

edited

Loading

github-actions bot commented Feb 13, 2025

MengqingCao Feb 13, 2025

youkaichao Feb 13, 2025

MengqingCao Feb 13, 2025

youkaichao Feb 13, 2025

MengqingCao Feb 13, 2025

wangxiyuan commented Feb 14, 2025

mergify bot commented Feb 14, 2025

tlrmchlsmth left a comment

wangxiyuan commented Feb 16, 2025

youkaichao commented Feb 16, 2025

[platform] add base class for communicators #13208

[platform] add base class for communicators #13208

Conversation

youkaichao commented Feb 13, 2025 • edited Loading

github-actions bot commented Feb 13, 2025

MengqingCao Feb 13, 2025

Choose a reason for hiding this comment

youkaichao Feb 13, 2025

Choose a reason for hiding this comment

MengqingCao Feb 13, 2025

Choose a reason for hiding this comment

youkaichao Feb 13, 2025

Choose a reason for hiding this comment

MengqingCao Feb 13, 2025

Choose a reason for hiding this comment

wangxiyuan commented Feb 14, 2025

mergify bot commented Feb 14, 2025

tlrmchlsmth left a comment

Choose a reason for hiding this comment

wangxiyuan commented Feb 16, 2025

youkaichao commented Feb 16, 2025

youkaichao commented Feb 13, 2025 •

edited

Loading