[serve][llm] Generalize DP Fix for LMCache Port Conflicts

# [Serve.LLM] Port collisions with TP/PP when using NIXL/LMCache KV transfer backends

## What happened + What you expected to happen

**Reference**: PR #55802 partially addressed port collisions for Data Parallelism by setting NIXL side-channel to `base_port + data_parallel_rank`.

**Issue**: With TP / PP, and multi-replica deployments, multiple vLLM processes on the same node concurrently probe for "open ports" using `get_open_port()` and can select the same one --> this can result in binding conflicts.

This manifests as:
- Flaky startups and deployment failures
- `Address already in use` errors from ZMQ
- `NIXL_ERR_BACKEND` errors
- Stuck initialization when scaling replicas or increasing TP/PP on shared nodes

**Expected**: Each worker should receive a unique port to avoid collisions across all parallelism strategies (DP/TP/PP).

## Symptoms

1. **Port collisions**: Multiple workers use identical ports
   ```
   Creating v1 connector with engine_id: Pkjr3b-10.0.235.241-52910 [repeated 3x across cluster]
   ```

2. **NIXL backend errors**:
   ```
   nixl._bindings.nixlBackendError: NIXL_ERR_BACKEND
   nixl_agent.cpp:481] registerMem: registration failed
   ```

3. **Deployment failures**: Replicas fail to initialize and continuously restart

## Root Cause

PR #55802 partially fixed port collisions for DP by adding logic to use `data_parallel_rank`:
```python
dp_rank = self.llm_config.engine_kwargs.get("data_parallel_rank", 0)
port = base_port + dp_rank
```

However, this only works when `data_parallel_rank` is explicitly set by DPServer, which only occurs in DP deployments.

For TP/PP deployments:
- `data_parallel_rank` is not set (or defaults to 0)
- All workers use offset 0 → same port for all workers
- Port collision occurs when multiple workers initialize on the same node

**Current code** (`nixl_connector.py`):
```python
dp_rank = self.llm_config.engine_kwargs.get("data_parallel_rank", 0)
port = base_port + dp_rank  # Always 0 for TP/PP!
```

## Reproduction

### Environment
- Ray: 3.0.0.dev0 (nightly)
- Python: 3.11.11
- Cluster: 8 GPU head node

### Minimal Config (`serve_config.yaml`)

```yaml
applications:
  - name: test-pp2-nixl
    import_path: ray.serve.llm:build_openai_app
    route_prefix: /
    args:
      llm_configs:
        - model_loading_config:
            model_id: facebook/opt-125m
          engine_kwargs:
            pipeline_parallel_size: 2
            tensor_parallel_size: 1
            max_num_seqs: 4
            enforce_eager: true
            kv_transfer_config:
              kv_connector: NixlConnector
              kv_role: kv_both
          deployment_config:
            autoscaling_config:
              min_replicas: 1
              max_replicas: 1
            ray_actor_options:
              num_cpus: 4
              num_gpus: 0
```

### Steps to Reproduce

1. Deploy the application:
   ```bash
   serve run serve_config.yaml
   ```

2. Check logs for port collision:
   ```bash
   ray logs --grep "Creating v1 connector"
   ```

3. Observe that all workers use the **same port**:
   ```
   Creating v1 connector with engine_id: Pkjr3b-10.0.235.241-52910
   Creating v1 connector with engine_id: Pkjr3b-10.0.235.241-52910 [repeated 3x]
   ```

4. Check for NIXL errors:
   ```bash
   ray logs --grep "NIXL_ERR_BACKEND"
   ```


## Related Issues

- #55775: Failed to launch disaggregated prefiller & decoder with `pipeline_parallel_size=2`
- PR #55802: Fixed DP DSV3 issues (DP-only fix)

## Versions / Dependencies

- Ray: 2.47+ (nightly 3.0.0.dev0 tested)
- vLLM: 0.10+
- Python: 3.11+

## Ways To Reproduce Issue

**Option 1 - NIXL with Pipeline Parallelism:**
Set `num_replicas=2, pipeline_parallel_size>=2` (or `tensor_parallel_size>=2`) with `kv_transfer_config={'kv_connector': 'NixlConnector', 'kv_role': 'kv_both'}` and observe bind conflicts without disambiguation.

**Option 2 - LMCache with numeric port:**
Set `kv_connector_extra_config={'lmcache_rpc_port': 5555}` and `num_replicas>=2`; observe `ZMQ EADDRINUSE` without disambiguation.

## Impact

**Without fix**: 
- TP/PP deployments with NIXL/LMCache fail to start
- Flaky deployments with intermittent port collisions
- Impossible to scale replicas reliably

**With fix**: 
- All parallelism strategies (DP/TP/PP) work correctly with unique ports per worker
- Reliable scaling and deployment
- No manual port management required

## Workaround

Currently, users must manually specify unique ports per worker using `NIXL_SIDE_CHANNEL_PORT_BASE` in `experimental_configs`, which is cumbersome and error-prone for multi-worker deployments.

## Issue Severity

High - Blocks TP/PP deployments with KV transfer backends



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[serve][llm] Generalize DP Fix for LMCache Port Conflicts #57757

[Serve.LLM] Port collisions with TP/PP when using NIXL/LMCache KV transfer backends

What happened + What you expected to happen

Symptoms

Root Cause

Reproduction

Environment

Minimal Config (`serve_config.yaml`)

Steps to Reproduce

Related Issues

Versions / Dependencies

Ways To Reproduce Issue

Impact

Workaround

Issue Severity

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[serve][llm] Generalize DP Fix for LMCache Port Conflicts #57757

Description

[Serve.LLM] Port collisions with TP/PP when using NIXL/LMCache KV transfer backends

What happened + What you expected to happen

Symptoms

Root Cause

Reproduction

Environment

Minimal Config (serve_config.yaml)

Steps to Reproduce

Related Issues

Versions / Dependencies

Ways To Reproduce Issue

Impact

Workaround

Issue Severity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Minimal Config (`serve_config.yaml`)