[serve][llm] Unify and Extend Builder Configuration for LLM Deployments #57724

kouroshHakha · 2025-10-15T03:15:45Z

Unify and Extend Builder Configuration for LLM Deployments

Overview

This PR transforms Ray Serve LLM builders by providing:

Unified configuration experience across standard LLM and Prefill/Decode deployments
Extensible plugin points for customizing components without writing custom builders

Previously, users who needed to customize ingress behavior or P/D proxy logic had to fork and maintain their own builder functions. Now, the builder configuration supports pluggable components out of the box.

Problem Statement

Inconsistent Configuration Patterns

Standard LLM and P/D builders used different configuration approaches, forcing users to learn multiple patterns.

Limited Extensibility

Users couldn't easily customize core components:

Want a custom ingress with authentication? → Write your own builder
Need modified P/D proxy behavior? → Copy and maintain builder code
Want different routing logic? → Fork the builder setup

Solution

1. Unified Configuration Pattern

Both builders now share the same intuitive structure:

# Standard LLM Builder
build_openai_app({
    "llm_configs": [llm_config],
    "ingress_cls_config": {...},         # Configure ingress
    "ingress_deployment_config": {...}    # Configure deployment
})

# P/D Builder - Same patterns!
build_pd_openai_app({
    "prefill_config": prefill_config,
    "decode_config": decode_config,
    "proxy_cls_config": {...},            # Configure proxy (P/D-specific)
    "proxy_deployment_config": {...},     # Configure deployment
    "ingress_cls_config": {...},          # Configure ingress (same as above!)
    "ingress_deployment_config": {...}    # Configure deployment (same as above!)
})

2. Pluggable Component System

Inject custom components without custom builders:

# Use custom ingress with authentication
build_openai_app({
    "llm_configs": [llm_config],
    "ingress_cls_config": {
        "ingress_cls": "mycompany.auth.AuthenticatedIngress",
        "ingress_extra_kwargs": {
            "auth_provider": "okta",
            "required_scopes": ["llm:read"]
        }
    }
})

# Use custom P/D proxy with logging
build_pd_openai_app({
    "prefill_config": p_config,
    "decode_config": d_config,
    "proxy_cls_config": {
        "proxy_cls": "mycompany.observability.LoggingPDProxy",
        "proxy_extra_kwargs": {
            "log_level": "DEBUG",
            "trace_all_requests": True
        }
    }
})

Key Features

Component Extensibility

Users can now customize any component:

# Example: Custom ingress with rate limiting
"ingress_cls_config": {
    "ingress_cls": "my.custom.RateLimitedIngress",  # Your class
    "ingress_extra_kwargs": {                       # Your params
        "rate_limit": 100,
        "burst_size": 10
    }
}

# Example: Custom P/D proxy with caching
"proxy_cls_config": {
    "proxy_cls": "my.custom.CachingPDProxy",
    "proxy_extra_kwargs": {
        "cache_backend": "redis",
        "ttl_seconds": 300
    }
}

No custom builder needed! Just implement your component class with the expected signature.

Flexible Input Types

Accept dicts, objects, or file paths everywhere:

# Mix and match as needed
"llm_configs": [
    dict,                    # Inline dict
    LLMConfig(...),         # Pydantic object
    "configs/model.yaml"    # File path
]

"ingress_cls_config": {...} or IngressClsConfig(...)  # Dict or object

Sensible Defaults with Override Ability

# Minimal config - uses OpenAiIngress and PDProxyServer
build_pd_openai_app({
    "prefill_config": p,
    "decode_config": d
})

# Override only what you need
build_pd_openai_app({
    "prefill_config": p,
    "decode_config": d,
    "proxy_cls_config": {
        "proxy_cls": "my.custom.Proxy"  # Just change the class
        # proxy_extra_kwargs defaults to {}
    }
    # ingress_cls_config defaults to OpenAiIngress
})

Type-Safe with String Paths

Load custom classes from strings for YAML/config file compatibility:

# In Python code
"ingress_cls": MyCustomIngress

# In YAML config
ingress_cls: "mymodule.components:MyCustomIngress"  # Colon or dot notation

Use Cases Enabled

Before: Custom Builder Required

# User had to copy and modify the entire builder
def my_custom_pd_builder(prefill, decode):
    # 50+ lines of boilerplate copied from Ray
    prefill_deployment = build_llm_deployment(prefill)
    decode_deployment = build_llm_deployment(decode)
    
    # The one line they actually wanted to customize:
    proxy = MyCustomProxy(prefill_deployment, decode_deployment, my_custom_arg=True)
    
    # More boilerplate...
    ingress = serve.deployment(OpenAiIngress).bind([proxy])
    return ingress

After: Configuration Only

# Just configure it!
app = build_pd_openai_app({
    "prefill_config": prefill,
    "decode_config": decode,
    "proxy_cls_config": {
        "proxy_cls": MyCustomProxy,
        "proxy_extra_kwargs": {"my_custom_arg": True}
    }
})

Benefits

No more custom builders for simple customizations
Consistent patterns across deployment types
Easy component swapping without code changes (e.g., via YAML configs)
Type-safe configuration with validation

Testing

Release tests: https://buildkite.com/ray-project/release/builds/63841 ✅

Future work

One thing we should do in a follow up PR is to make PDProxyServer inherit LLMServerProtocol than LLMServer. It will become confusing over time when developers look at PDProxyServer's implementation and use that as the example of how to implement the rest of the stuff.

- Removed the `cu121` build argument from `rayllm.depsets.yaml`. - Updated the `nixl` package version to `0.6.1` in multiple lock files to ensure compatibility. - Adjusted the Dockerfile to comment out unused build arguments related to `ROOT_DIR`, `GDR_HOME`, `UCX_HOME`, and `NIXL_HOME`. - Cleaned up several lock files by deleting outdated versions for `cu121` and ensuring consistency across `cu128` and `cpu` configurations. These changes aim to streamline the dependency management and improve the build process for the rayllm project. Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha · 2025-10-15T17:42:42Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a significant and well-executed revamp of the Prefill/Decode (P/D) and standard LLM deployment APIs. The move to a unified, Pydantic-based configuration model is a major improvement, making the APIs more intuitive, type-safe, and extensible. The ability to use pluggable components for ingress and proxy via configuration is a fantastic feature that will greatly simplify customizations for users. The code is cleaner and more maintainable as a result of this refactoring.

My review includes a couple of points for consideration. One is a behavioral change regarding the kv_transfer_config in P/D deployments, which is now mandatory. The other is a minor regression in parsing LLM configurations from raw YAML strings.

Overall, this is an excellent contribution that significantly enhances the usability and flexibility of Ray Serve's LLM capabilities.

python/ray/llm/_internal/serve/deployments/prefill_decode_disagg/builder_pd.py

python/ray/llm/_internal/serve/deployments/routers/builder_ingress.py

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

ruisearch42

LGTM, just some nitpicks

ruisearch42 · 2025-10-15T23:44:06Z

python/ray/serve/llm/__init__.py



    Args:
        pd_serving_args: The dictionary containing prefill and decode configs.


update this as well?

pd_serving_args: A dict that conforms to the PDServingArgs pydantic model.

ruisearch42 · 2025-10-15T23:46:38Z

python/ray/llm/_internal/serve/deployments/prefill_decode_disagg/builder_pd.py

+class ProxyClsConfig(BaseModelExtended):
+    proxy_cls: Union[str, type[PDProxyServer]] = Field(
+        default=PDProxyServer,
+        description="The class name of the proxy class.",


nit: The proxy class or class name

ruisearch42 · 2025-10-15T23:50:25Z

python/ray/llm/_internal/serve/deployments/prefill_decode_disagg/builder_pd.py

+        elif isinstance(value, LLMConfig):
+            return value
+        else:
+            raise ValueError(f"Invalid LLMConfig: {value}")


nit: TypeError?

ruisearch42 · 2025-10-15T23:53:13Z

python/ray/llm/_internal/serve/deployments/prefill_decode_disagg/builder_pd.py

+    @field_validator("prefill_config")
+    @classmethod
+    def _validate_prefill_config(cls, value: Any) -> LLMConfig:
+        return cls._validate_llm_config(value)
+
+    @field_validator("decode_config")
+    @classmethod
+    def _validate_decode_config(cls, value: Any) -> LLMConfig:
+        return cls._validate_llm_config(value)


can do this?

@field_validator("prefill_config", "decode_config") @classmethod def _validate_llm_configs(cls, value: Any) -> LLMConfig: return cls._validate_llm_config(value)

ruisearch42 · 2025-10-16T00:05:15Z

python/ray/llm/_internal/serve/deployments/routers/builder_ingress.py

+            elif isinstance(config, LLMConfig):
+                llm_configs.append(config)
+            else:
+                raise ValueError(f"Invalid LLMConfig: {config}")


Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha · 2025-10-16T00:52:40Z

release tests failing here: https://buildkite.com/ray-project/release/builds/63841 ❌

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

…ts (ray-project#57724) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

…ts (ray-project#57724) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

…ts (ray-project#57724) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: xgui <xgui@anyscale.com>

Original PR #57724 by kouroshHakha Original: ray-project/ray#57724

…ation for LLM Deployments Merged from original PR #57724 Original: ray-project/ray#57724

…ts (#57724) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

…ts (ray-project#57724) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

…ts (ray-project#57724) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>

kouroshHakha added 8 commits October 14, 2025 11:37

wip

f59126d

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

e172d2f

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

c8e9983

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

a04860b

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

1621289

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

647a6d2

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

906efc3

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha added the go add ONLY when ready to merge, run all tests label Oct 15, 2025

kouroshHakha added 2 commits October 15, 2025 10:17

wip

85817bb

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Merge branch 'master' into kh/revamp-pd

151aede

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha marked this pull request as ready for review October 15, 2025 17:42

kouroshHakha requested review from a team as code owners October 15, 2025 17:42

kouroshHakha changed the title ~~[serve][llm] revamp pd APIs~~ [serve][llm] Unify and Extend Builder Configuration for LLM Deployments Oct 15, 2025

This comment was marked as outdated.

Sign in to view

gemini-code-assist bot reviewed Oct 15, 2025

View reviewed changes

python/ray/llm/_internal/serve/deployments/prefill_decode_disagg/builder_pd.py Show resolved Hide resolved

python/ray/llm/_internal/serve/deployments/routers/builder_ingress.py Show resolved Hide resolved

Wip

577c875

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

This comment was marked as outdated.

Sign in to view

ray-gardener bot added serve Ray Serve Related Issue llm labels Oct 15, 2025

kouroshHakha mentioned this pull request Oct 15, 2025

[serve][llm] Refactor to Protocol-based typing and remove PDProxyServer inheritance from LLMServer #57743

Merged

16 tasks

kouroshHakha added 2 commits October 15, 2025 16:15

wip

fd2ef68

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

8e3bce7

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

ruisearch42 approved these changes Oct 16, 2025

View reviewed changes

wip

f09457a

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

ruisearch42 approved these changes Oct 16, 2025

View reviewed changes

wip

8c2c37d

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

fc89787

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha enabled auto-merge (squash) October 16, 2025 04:47

abrarsheikh approved these changes Oct 16, 2025

View reviewed changes

kouroshHakha merged commit 4706bf1 into ray-project:master Oct 16, 2025
7 checks passed

justinyeh1995 pushed a commit to justinyeh1995/ray that referenced this pull request Oct 20, 2025

[serve][llm] Unify and Extend Builder Configuration for LLM Deploymen…

96163dc

…ts (ray-project#57724) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

snorkelopstesting3-bot mentioned this pull request Oct 22, 2025

[serve][llm] Unify and Extend Builder Configuration for LLM Deployments snorkel-marlin-repos/ray-project_ray_pr_57724_12d6ffa3-1a61-439d-9c7e-28b53dd3eea0#1

Merged

elliot-barn pushed a commit that referenced this pull request Oct 23, 2025

[serve][llm] Unify and Extend Builder Configuration for LLM Deploymen…

edf879a

…ts (#57724) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025

[serve][llm] Unify and Extend Builder Configuration for LLM Deploymen…

e1d4ce1

…ts (ray-project#57724) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>



		Args:
		pd_serving_args: The dictionary containing prefill and decode configs.

[serve][llm] Unify and Extend Builder Configuration for LLM Deployments #57724

[serve][llm] Unify and Extend Builder Configuration for LLM Deployments #57724

Conversation

kouroshHakha commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Unify and Extend Builder Configuration for LLM Deployments

Overview

Problem Statement

Inconsistent Configuration Patterns

Limited Extensibility

Solution

1. Unified Configuration Pattern

2. Pluggable Component System

Key Features

Component Extensibility

Flexible Input Types

Sensible Defaults with Override Ability

Type-Safe with String Paths

Use Cases Enabled

Before: Custom Builder Required

After: Configuration Only

Benefits

Testing

Future work

Uh oh!

kouroshHakha commented Oct 15, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

ruisearch42 left a comment

Choose a reason for hiding this comment

Uh oh!

ruisearch42 Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

ruisearch42 Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

ruisearch42 Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

ruisearch42 Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

ruisearch42 Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

kouroshHakha commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kouroshHakha commented Oct 15, 2025 •

edited

Loading

kouroshHakha commented Oct 16, 2025 •

edited

Loading