Skip to content

Conversation

@panpan0000
Copy link

@panpan0000 panpan0000 commented Sep 18, 2025

add UT just like vllm-project#20759 does

I don't have UCC complied Pytorch to verify either CPU or GPU side.

Can you help ? @ikryukov @lengrongfu

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@panpan0000
Copy link
Author

panpan0000 commented Sep 19, 2025

detailed log (the debug logging has been removed from this PR)

VLLM_USE_UCC=1 pytest tests/distributed/test_ucc_communicator-3.py -vvs --log-cli-level=DEBUG

INFO 09-19 05:30:27 [__init__.py:216] Automatically detected platform cuda.
========================================================================================================================== test session starts ==========================================================================================================================
platform linux -- Python 3.12.3, pytest-8.1.1, pluggy-1.6.0 -- /usr/bin/python
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase(PosixPath('/workspace/vllm/.hypothesis/examples'))
rootdir: /workspace/vllm
configfile: pyproject.toml
plugins: xdist-3.6.1, rerunfailures-15.1, hypothesis-6.130.8, shard-0.1.2, xdoctest-1.0.2, flakefinder-1.1.0, anyio-4.9.0, typeguard-4.3.0
collecting ... WARNING 09-19 05:30:28 [interface.py:533] Current platform cuda does not have '_pytestfixturefunction' attribute.
WARNING 09-19 05:30:28 [interface.py:533] Current platform cuda does not have '__test__' attribute.
WARNING 09-19 05:30:28 [interface.py:533] Current platform cuda does not have '__bases__' attribute.
WARNING 09-19 05:30:28 [interface.py:533] Current platform cuda does not have '__test__' attribute.
collected 4 items                                                                                                                                                                                                                                                       
Running 4 items in this shard: tests/distributed/test_ucc_communicator-3.py::test_ucc_allreduce[1-2], tests/distributed/test_ucc_communicator-3.py::test_ucc_availability[1-2], tests/distributed/test_ucc_communicator-3.py::test_ucc_communicator_initialization, tests/distributed/test_ucc_communicator-3.py::test_ucc_static_methods

tests/distributed/test_ucc_communicator-3.py::test_ucc_allreduce[1-2] INFO 09-19 05:30:35 [__init__.py:216] Automatically detected platform cuda.
2025-09-19 05:30:35,141 - ucc_test - DEBUG - Starting ucc_allreduce_worker with rank 0, world_size 2
2025-09-19 05:30:35,141 - ucc_test - DEBUG - Selected device: cuda:0, dtype: torch.bfloat16
INFO 09-19 05:30:35 [__init__.py:216] Automatically detected platform cuda.
2025-09-19 05:30:35,360 - ucc_test - DEBUG - Starting ucc_allreduce_worker with rank 1, world_size 2
2025-09-19 05:30:35,360 - ucc_test - DEBUG - Selected device: cuda:1, dtype: torch.bfloat16
WARNING 09-19 05:30:35 [__init__.py:3864] Current vLLM config is not set.
WARNING 09-19 05:30:36 [__init__.py:3864] Current vLLM config is not set.
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
WARNING 09-19 05:30:36 [__init__.py:3864] Current vLLM config is not set.
WARNING 09-19 05:30:36 [__init__.py:3864] Current vLLM config is not set.
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
WARNING 09-19 05:30:36 [__init__.py:3864] Current vLLM config is not set.
WARNING 09-19 05:30:36 [__init__.py:3864] Current vLLM config is not set.
INFO 09-19 05:30:36 [__init__.py:1433] Found nccl from library libnccl.so.2
INFO 09-19 05:30:36 [pynccl.py:70] vLLM is using nccl==2.26.5
INFO 09-19 05:30:36 [__init__.py:1433] Found nccl from library libnccl.so.2
INFO 09-19 05:30:36 [pynccl.py:70] vLLM is using nccl==2.26.5
INFO 09-19 05:30:37 [custom_all_reduce.py:35] Skipping P2P check and trusting the driver's P2P report.
INFO 09-19 05:30:37 [custom_all_reduce.py:35] Skipping P2P check and trusting the driver's P2P report.
INFO 09-19 05:30:37 [ucc_communicator.py:56] UCCCommunicator initialized successfully with UCC backend on device cuda:1, world size: 2
INFO 09-19 05:30:37 [ucc_communicator.py:56] UCCCommunicator initialized successfully with UCC backend on device cuda:0, world size: 2
INFO 09-19 05:30:37 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[1], buffer_handle=(1, 4194304, 6, 'psm_dd0f2668'), local_subscribe_addr='ipc:///tmp/5e555a05-e12f-43b5-a76e-577bc47cfa7b', remote_subscribe_addr=None, remote_addr_ipv6=False)
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
WARNING 09-19 05:30:37 [__init__.py:3864] Current vLLM config is not set.
WARNING 09-19 05:30:37 [__init__.py:3864] Current vLLM config is not set.
INFO 09-19 05:30:37 [ucc_communicator.py:56] UCCCommunicator initialized successfully with UCC backend on device cuda:1, world size: 2
INFO 09-19 05:30:37 [parallel_state.py:1165] rank 1 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 1, EP rank 1
2025-09-19 05:30:37,630 - ucc_test - DEBUG - Rank 1: Checking if UCC is available
2025-09-19 05:30:37,631 - ucc_test - DEBUG - Rank 1: UCC available: True
2025-09-19 05:30:37,631 - ucc_test - DEBUG - Rank 1: Getting tensor model parallel group
2025-09-19 05:30:37,631 - ucc_test - DEBUG - Rank 1: Got tensor model parallel group: <torch.distributed.distributed_c10d.ProcessGroup object at 0x7f1fc1c6bf30>
2025-09-19 05:30:37,631 - ucc_test - DEBUG - Rank 1: Creating UCC process group
2025-09-19 05:30:37,631 - ucc_test - DEBUG - Rank 1: Created UCC process group: <torch.distributed.distributed_c10d.ProcessGroup object at 0x7f1fc01119b0>
INFO 09-19 05:30:37 [ucc_communicator.py:56] UCCCommunicator initialized successfully with UCC backend on device cuda:1, world size: 2
INFO 09-19 05:30:37 [ucc_communicator.py:56] UCCCommunicator initialized successfully with UCC backend on device cuda:0, world size: 2
INFO 09-19 05:30:37 [parallel_state.py:1165] rank 0 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
2025-09-19 05:30:37,631 - ucc_test - DEBUG - Rank 0: Checking if UCC is available
2025-09-19 05:30:37,631 - ucc_test - DEBUG - Rank 0: UCC available: True
2025-09-19 05:30:37,631 - ucc_test - DEBUG - Rank 0: Getting tensor model parallel group
2025-09-19 05:30:37,631 - ucc_test - DEBUG - Rank 0: Got tensor model parallel group: <torch.distributed.distributed_c10d.ProcessGroup object at 0x7f610547c870>
2025-09-19 05:30:37,632 - ucc_test - DEBUG - Rank 0: Creating UCC process group
2025-09-19 05:30:37,632 - ucc_test - DEBUG - Rank 0: Created UCC process group: <torch.distributed.distributed_c10d.ProcessGroup object at 0x7f610410d430>
INFO 09-19 05:30:37 [ucc_communicator.py:56] UCCCommunicator initialized successfully with UCC backend on device cuda:0, world size: 2

[Rank 1] Input tensor shape: torch.Size([4194304]), dtype: torch.bfloat16, device: cuda:1

[Rank 0] Input tensor shape: torch.Size([4194304]), dtype: torch.bfloat16, device: cuda:0
[Rank 1] Input tensor stats - min: 1.0, max: 22.0, mean: 11.499662399291992
[Rank 0] Input tensor stats - min: 1.0, max: 22.0, mean: 11.499662399291992
[Rank 0] Output tensor stats - min: 2.0, max: 44.0, mean: 22.999324798583984
[Rank 1] Output tensor stats - min: 2.0, max: 44.0, mean: 22.999324798583984

[Rank 0] Testing op: RedOpType.SUM, tensor shape: torch.Size([1024]), dtype: torch.bfloat16
[Rank 0] Input tensor stats - min: 1.0, max: 9.0, mean: 4.994140625

[Rank 1] Testing op: RedOpType.SUM, tensor shape: torch.Size([1024]), dtype: torch.bfloat16
[Rank 1] Input tensor stats - min: 1.0, max: 9.0, mean: 4.994140625
[Rank 1] Output tensor stats after RedOpType.SUM - min: 2.0, max: 18.0, mean: 9.98828125
[Rank 0] Output tensor stats after RedOpType.SUM - min: 2.0, max: 18.0, mean: 9.98828125

[Rank 1] Testing op: RedOpType.MAX, tensor shape: torch.Size([1024]), dtype: torch.bfloat16

[Rank 0] Testing op: RedOpType.MAX, tensor shape: torch.Size([1024]), dtype: torch.bfloat16
[Rank 1] Input tensor stats - min: 1.0, max: 9.0, mean: 5.025390625
[Rank 0] Input tensor stats - min: 1.0, max: 9.0, mean: 5.025390625
[Rank 1] Output tensor stats after RedOpType.MAX - min: 1.0, max: 9.0, mean: 5.025390625
[Rank 0] Output tensor stats after RedOpType.MAX - min: 1.0, max: 9.0, mean: 5.025390625

[Rank 1] Testing op: RedOpType.MIN, tensor shape: torch.Size([1024]), dtype: torch.bfloat16

[Rank 0] Testing op: RedOpType.MIN, tensor shape: torch.Size([1024]), dtype: torch.bfloat16
[Rank 1] Input tensor stats - min: 1.0, max: 9.0, mean: 4.939453125
[Rank 0] Input tensor stats - min: 1.0, max: 9.0, mean: 4.939453125
[Rank 1] Output tensor stats after RedOpType.MIN - min: 1.0, max: 9.0, mean: 4.939453125
[Rank 0] Output tensor stats after RedOpType.MIN - min: 1.0, max: 9.0, mean: 4.939453125
[rank0]:[W919 05:30:40.708318052 ProcessGroupNCCL.cpp:1505] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
PASSED
tests/distributed/test_ucc_communicator-3.py::test_ucc_availability[1-2] INFO 09-19 05:30:48 [__init__.py:216] Automatically detected platform cuda.
INFO 09-19 05:30:49 [__init__.py:216] Automatically detected platform cuda.
WARNING 09-19 05:30:49 [__init__.py:3864] Current vLLM config is not set.
WARNING 09-19 05:30:49 [__init__.py:3864] Current vLLM config is not set.
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
WARNING 09-19 05:30:49 [__init__.py:3864] Current vLLM config is not set.
WARNING 09-19 05:30:49 [__init__.py:3864] Current vLLM config is not set.
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
WARNING 09-19 05:30:49 [__init__.py:3864] Current vLLM config is not set.
WARNING 09-19 05:30:49 [__init__.py:3864] Current vLLM config is not set.
INFO 09-19 05:30:49 [__init__.py:1433] Found nccl from library libnccl.so.2
INFO 09-19 05:30:49 [pynccl.py:70] vLLM is using nccl==2.26.5
INFO 09-19 05:30:49 [__init__.py:1433] Found nccl from library libnccl.so.2
INFO 09-19 05:30:49 [pynccl.py:70] vLLM is using nccl==2.26.5
INFO 09-19 05:30:50 [custom_all_reduce.py:35] Skipping P2P check and trusting the driver's P2P report.
INFO 09-19 05:30:50 [custom_all_reduce.py:35] Skipping P2P check and trusting the driver's P2P report.
INFO 09-19 05:30:50 [ucc_communicator.py:56] UCCCommunicator initialized successfully with UCC backend on device cuda:0, world size: 2
INFO 09-19 05:30:50 [ucc_communicator.py:56] UCCCommunicator initialized successfully with UCC backend on device cuda:1, world size: 2
INFO 09-19 05:30:50 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[1], buffer_handle=(1, 4194304, 6, 'psm_46082ee1'), local_subscribe_addr='ipc:///tmp/2db55b37-aca4-4d2d-84fc-1365c4102722', remote_subscribe_addr=None, remote_addr_ipv6=False)
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
WARNING 09-19 05:30:50 [__init__.py:3864] Current vLLM config is not set.
WARNING 09-19 05:30:50 [__init__.py:3864] Current vLLM config is not set.
INFO 09-19 05:30:50 [ucc_communicator.py:56] UCCCommunicator initialized successfully with UCC backend on device cuda:0, world size: 2
INFO 09-19 05:30:50 [parallel_state.py:1165] rank 0 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
INFO 09-19 05:30:50 [ucc_communicator.py:56] UCCCommunicator initialized successfully with UCC backend on device cuda:1, world size: 2
INFO 09-19 05:30:50 [parallel_state.py:1165] rank 1 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 1, EP rank 1
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
WARNING 09-19 05:30:50 [ucc_communicator.py:38] UCCCommunicator requires a UCC process group backend, but got backend: gloo. Disabling UCC allreduce.
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
WARNING 09-19 05:30:50 [ucc_communicator.py:38] UCCCommunicator requires a UCC process group backend, but got backend: gloo. Disabling UCC allreduce.
[rank0]:[W919 05:30:51.835072704 ProcessGroupNCCL.cpp:1505] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
PASSED
tests/distributed/test_ucc_communicator-3.py::test_ucc_communicator_initialization PASSED
tests/distributed/test_ucc_communicator-3.py::test_ucc_static_methods PASSED

@panpan0000
Copy link
Author

@ikryukov do you have time to review and merge ?

@ikryukov
Copy link
Owner

@ikryukov do you have time to review and merge ?

It is ready to merge, thanks! but could you rebase it, since I added co-author lengrongfu to previous commit

@panpan0000
Copy link
Author

@ikryukov rebased , thank you .

@panpan0000 panpan0000 force-pushed the ucc_integration-pr-ut branch from ed9cf39 to a90900f Compare September 24, 2025 06:48
@ikryukov ikryukov merged commit cbbb8a8 into ikryukov:ucc_integration Sep 24, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants