Commit c74a53f
committed
serve.llm: Fix port collisions for TP/PP with NIXL/LMCache
Extends port collision fix to Tensor Parallelism (TP) and Pipeline
Parallelism (PP) scenarios. Previous fix (PR ray-project#55802) only addressed
Data Parallelism by using explicit data_parallel_rank.
Changes:
- base.py: Added _compute_port_offset() method with fallback logic
* Priority 1: Use data_parallel_rank if set (DP case)
* Priority 2: Hash replica_tag for deterministic offset (TP/PP case)
* Fallback: Return 0
- nixl_connector.py: Use _compute_port_offset() instead of dp_rank
- lmcache_connector_v1.py: Add numeric port support with offset logic
Fixes port collision errors in TP/PP deployments:
- Multiple workers no longer bind to same port
- Prevents NIXL_ERR_BACKEND and ZMQ errors
- Enables successful deployment with pipeline_parallel_size > 1
Reproduction:
Deployed Ray Serve with pipeline_parallel_size=2 and NIXL on Ray
3.0.0.dev0 (8 x L4 GPU cluster). Before fix, all workers used identical
port (e.g., 52910), causing NIXL_ERR_BACKEND. Logs showed:
'Creating v1 connector with engine_id: ...-52910 [repeated 3x]'
After fix, each worker receives unique port via replica tag hashing,
eliminating collisions.
Related: ray-project#55775
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>1 parent 6f9ef13 commit c74a53f
File tree
3 files changed
+63
-19
lines changed- python/ray/llm/_internal/serve/deployments/llm/vllm/kv_transfer_backends
3 files changed
+63
-19
lines changedLines changed: 32 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
38 | 70 | | |
39 | 71 | | |
40 | 72 | | |
| |||
Lines changed: 29 additions & 14 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
28 | | - | |
29 | | - | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
30 | 32 | | |
31 | 33 | | |
32 | 34 | | |
| |||
41 | 43 | | |
42 | 44 | | |
43 | 45 | | |
44 | | - | |
45 | | - | |
46 | | - | |
47 | | - | |
48 | | - | |
49 | | - | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
50 | 53 | | |
51 | | - | |
52 | | - | |
53 | | - | |
54 | | - | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
55 | 59 | | |
56 | | - | |
| 60 | + | |
57 | 61 | | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
58 | 73 | | |
59 | 74 | | |
60 | 75 | | |
61 | | - | |
| 76 | + | |
Lines changed: 2 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
| 18 | + | |
| 19 | + | |
23 | 20 | | |
24 | 21 | | |
25 | 22 | | |
| |||
0 commit comments