[serve][llm] Add TP*PP spacing to port offset for multi-replica deployments #58073

nrghosh · 2025-10-24T01:24:52Z

Multiply replica_rank by num_devices (tp × pp) to prevent port collisions when scaling to 2+ replicas with TP≥2 or PP≥2.

Root Cause

PR #57771 fixed port collisions in python/ray/llm/_internal/serve/engines/vllm/kv_transfer/base.py for TP/PP by using Ray Serve's replica_rank for port offsets instead of defaulting to 0. However, the implementation doesn't account for port spacing needed when each replica spawns multiple workers - so it could still lead to overlap.

Main issue: Consecutive replicas get consecutive port offsets (0, 1, 2, ...), but each replica actually needs num_devices (tp × pp) consecutive ports for its workers. This causes port ranges to overlap between replicas.

Example: 2 replicas, TP=2

Current implementation in base.py:_compute_port_offset():

  return rc.rank  # Returns 0, 1, 2, ...

Port allocation:

  Replica 0 (rank=0): offset=0  → base port 50000 → workers use [50000, 50001]
  Replica 1 (rank=1): offset=1  → base port 50001 → workers use [50001, 50002]
                                                                   ^^^^^ Collision!

Replica 0 Worker 1 and Replica 1 Worker 0 both bind to port 50001.

Example: 2 replicas, TP=2, PP=2

Each replica spawns 4 workers (tp × pp = 2 × 2 = 4). Each worker needs a unique port.

Current implementation (incorrect):
  Replica 0 (rank=0): offset=0  → workers use [50000, 50001, 50002, 50003]
  Replica 1 (rank=1): offset=1  → workers use [50001, 50002, 50003, 50004]
                                              ^^^^^ Multiple collisions!

With fix (multiply by num_devices = 4):
  Replica 0 (rank=0): offset=0  → workers use [50000, 50001, 50002, 50003]
  Replica 1 (rank=1): offset=4  → workers use [50004, 50005, 50006, 50007]
                                              ✓ No collisions

Solution:

Space replicas by num_devices (tp × pp) ports to reserve room for all workers:

Replica 0 uses ports: [base, base+1, ..., base+(num_devices-1)]
Replica 1 uses ports: [base+num_devices, base+num_devices+1, ...]

The fix uses llm_config.get_engine_config().num_devices which correctly accounts for both TP and PP workers.

Impact:

Fixes port collisions when autoscaling to 2+ replicas with TP≥2 or PP≥2
Handles combined TP+PP scenarios correctly (e.g., TP=2, PP=2 requires 4 ports per replica)
Backward compatible: TP=1, PP=1 multiplies by 1 (no-op)
DP deployments unchanged: vLLM handles spacing internally
Single replica deployments unchanged: no other replica to collide with

Note (about Data Parallel)

DP deployments don't need this fix because vLLM already multiplies data_parallel_rank by tp_size for the offset internally:

# vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py:641-642
side_channel_port = base + (data_parallel_rank × tensor_parallel_size)

So for DP, the spacing is automatic - but for replica_rank, we do the offset multiplication ourselves since vLLM doesn't know about Ray Serve's replica concept. The fix uses num_devices instead of just tp_size to ensure PP workers also get unique ports.

Related: PR #57771, #55775, #58072

Multiplies replica_rank by tensor_parallel_size to prevent port collisions when scaling to 2+ replicas with TP≥2. Problem: PR ray-project#57771 fixed inter-replica port collisions by using replica_rank instead of defaulting to 0. However, it didn't account for the port space needed by TP workers within each replica. vLLM workers add their tp_rank (0, 1, ..., tp_size-1) to the base port at bind time (vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py:790). Without proper spacing, consecutive replicas have overlapping port ranges: Replica 0 TP Worker 1: base + 0 + 1 = 50001 Replica 1 TP Worker 0: base + 1 + 0 = 50001 ← Collision Solution: Space replicas by tp_size ports to reserve room for all TP workers: Replica 0 uses ports: [base, base+1, ..., base+(tp_size-1)] Replica 1 uses ports: [base+tp_size, base+tp_size+1, ...] Impact: - Fixes port collisions when autoscaling to 2+ replicas with TP≥2 - Backward compatible: TP=1 multiplies by 1 (no-op) - DP deployments unchanged: vLLM handles spacing - Single replica deployments unchanged: no other replica to collide with Related: PR ray-project#57771, ray-project#55775 Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

kouroshHakha

This fix would still have a problem when we have TP2PP2, because it doesn't consider PP at all. You should use a generic num_device API which already exist in llmconfig --> engine_config.

kouroshHakha · 2025-10-30T01:20:03Z

python/ray/llm/_internal/serve/engines/vllm/kv_transfer/base.py

-                return rc.rank
+                # Multiply by tp_size to reserve ports for all TP workers
+                # Each TP worker will add its tp_rank (0, 1, ..., tp_size-1)
+                return rc.rank * tp_size


you need to offset by tp * pp . Effectively you should use llm_config.get_engine_config().num_devices

Previous fix didn't quite get it right for TPXPPY scenario. Use llm_config.get_engine_config().num_devices instead of manually calculating tp_size, ensuring proper port spacing for both TP and PP workers. Fixes the case where PP workers also bind NIXL ports and need spacing in addition to TP workers. Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

…yments (ray-project#58073) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

…yments (ray-project#58073) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>

nrghosh closed this Oct 24, 2025

nrghosh reopened this Oct 24, 2025

nrghosh added 2 commits October 23, 2025 18:39

wip

5aaedd1

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

wip

38ec682

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

nrghosh added the go add ONLY when ready to merge, run all tests label Oct 24, 2025

nrghosh self-assigned this Oct 24, 2025

nrghosh added 2 commits October 24, 2025 18:12

Merge branch 'master' into nrghosh/fix-tp-port-spacing

66f0ef7

Merge branch 'master' into nrghosh/fix-tp-port-spacing

3ddcdbe

nrghosh marked this pull request as ready for review October 28, 2025 17:04

nrghosh requested a review from a team as a code owner October 28, 2025 17:04

nrghosh mentioned this pull request Oct 28, 2025

[Serve.LLM] Failed to launch disaggregated prefiller & decoder worker when enabled pipeline_parallel_size =2 #55775

Closed

ray-gardener bot added serve Ray Serve Related Issue llm labels Oct 28, 2025

nrghosh requested a review from kouroshHakha October 28, 2025 23:23

kouroshHakha changed the title ~~serve.llm: Add TP spacing to port offset for multi-replica deployments~~ [serve][llm] Add TP spacing to port offset for multi-replica deployments Oct 30, 2025

kouroshHakha requested changes Oct 30, 2025

View reviewed changes

nrghosh changed the title ~~[serve][llm] Add TP spacing to port offset for multi-replica deployments~~ [serve][llm] Add TP*PP spacing to port offset for multi-replica deployments Oct 30, 2025

kouroshHakha approved these changes Oct 31, 2025

View reviewed changes

kouroshHakha merged commit 71a2f40 into ray-project:master Oct 31, 2025
6 checks passed

YoussefEssDS pushed a commit to YoussefEssDS/ray that referenced this pull request Nov 8, 2025

[serve][llm] Add TP*PP spacing to port offset for multi-replica deplo…

35f42cd

…yments (ray-project#58073) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025

[serve][llm] Add TP*PP spacing to port offset for multi-replica deplo…

0c7d312

…yments (ray-project#58073) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[serve][llm] Add TP*PP spacing to port offset for multi-replica deployments #58073

[serve][llm] Add TP*PP spacing to port offset for multi-replica deployments #58073

Uh oh!

nrghosh commented Oct 24, 2025 •

edited

Loading

Uh oh!

kouroshHakha left a comment

Uh oh!

kouroshHakha Oct 30, 2025

Uh oh!

nrghosh Oct 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[serve][llm] Add TP*PP spacing to port offset for multi-replica deployments #58073

[serve][llm] Add TP*PP spacing to port offset for multi-replica deployments #58073

Uh oh!

Conversation

nrghosh commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kouroshHakha left a comment

Choose a reason for hiding this comment

Uh oh!

kouroshHakha Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

nrghosh Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nrghosh commented Oct 24, 2025 •

edited

Loading