Skip to content

Conversation

@kouroshHakha
Copy link
Contributor

@kouroshHakha kouroshHakha commented Oct 15, 2025

Unify and Extend Builder Configuration for LLM Deployments

Overview

This PR transforms Ray Serve LLM builders by providing:

  1. Unified configuration experience across standard LLM and Prefill/Decode deployments
  2. Extensible plugin points for customizing components without writing custom builders

Previously, users who needed to customize ingress behavior or P/D proxy logic had to fork and maintain their own builder functions. Now, the builder configuration supports pluggable components out of the box.

Problem Statement

Inconsistent Configuration Patterns

Standard LLM and P/D builders used different configuration approaches, forcing users to learn multiple patterns.

Limited Extensibility

Users couldn't easily customize core components:

  • Want a custom ingress with authentication? → Write your own builder
  • Need modified P/D proxy behavior? → Copy and maintain builder code
  • Want different routing logic? → Fork the builder setup

Solution

1. Unified Configuration Pattern

Both builders now share the same intuitive structure:

# Standard LLM Builder
build_openai_app({
    "llm_configs": [llm_config],
    "ingress_cls_config": {...},         # Configure ingress
    "ingress_deployment_config": {...}    # Configure deployment
})

# P/D Builder - Same patterns!
build_pd_openai_app({
    "prefill_config": prefill_config,
    "decode_config": decode_config,
    "proxy_cls_config": {...},            # Configure proxy (P/D-specific)
    "proxy_deployment_config": {...},     # Configure deployment
    "ingress_cls_config": {...},          # Configure ingress (same as above!)
    "ingress_deployment_config": {...}    # Configure deployment (same as above!)
})

2. Pluggable Component System

Inject custom components without custom builders:

# Use custom ingress with authentication
build_openai_app({
    "llm_configs": [llm_config],
    "ingress_cls_config": {
        "ingress_cls": "mycompany.auth.AuthenticatedIngress",
        "ingress_extra_kwargs": {
            "auth_provider": "okta",
            "required_scopes": ["llm:read"]
        }
    }
})

# Use custom P/D proxy with logging
build_pd_openai_app({
    "prefill_config": p_config,
    "decode_config": d_config,
    "proxy_cls_config": {
        "proxy_cls": "mycompany.observability.LoggingPDProxy",
        "proxy_extra_kwargs": {
            "log_level": "DEBUG",
            "trace_all_requests": True
        }
    }
})

Key Features

Component Extensibility

Users can now customize any component:

# Example: Custom ingress with rate limiting
"ingress_cls_config": {
    "ingress_cls": "my.custom.RateLimitedIngress",  # Your class
    "ingress_extra_kwargs": {                       # Your params
        "rate_limit": 100,
        "burst_size": 10
    }
}

# Example: Custom P/D proxy with caching
"proxy_cls_config": {
    "proxy_cls": "my.custom.CachingPDProxy",
    "proxy_extra_kwargs": {
        "cache_backend": "redis",
        "ttl_seconds": 300
    }
}

No custom builder needed! Just implement your component class with the expected signature.

Flexible Input Types

Accept dicts, objects, or file paths everywhere:

# Mix and match as needed
"llm_configs": [
    dict,                    # Inline dict
    LLMConfig(...),         # Pydantic object
    "configs/model.yaml"    # File path
]

"ingress_cls_config": {...} or IngressClsConfig(...)  # Dict or object

Sensible Defaults with Override Ability

# Minimal config - uses OpenAiIngress and PDProxyServer
build_pd_openai_app({
    "prefill_config": p,
    "decode_config": d
})

# Override only what you need
build_pd_openai_app({
    "prefill_config": p,
    "decode_config": d,
    "proxy_cls_config": {
        "proxy_cls": "my.custom.Proxy"  # Just change the class
        # proxy_extra_kwargs defaults to {}
    }
    # ingress_cls_config defaults to OpenAiIngress
})

Type-Safe with String Paths

Load custom classes from strings for YAML/config file compatibility:

# In Python code
"ingress_cls": MyCustomIngress

# In YAML config
ingress_cls: "mymodule.components:MyCustomIngress"  # Colon or dot notation

Use Cases Enabled

Before: Custom Builder Required

# User had to copy and modify the entire builder
def my_custom_pd_builder(prefill, decode):
    # 50+ lines of boilerplate copied from Ray
    prefill_deployment = build_llm_deployment(prefill)
    decode_deployment = build_llm_deployment(decode)
    
    # The one line they actually wanted to customize:
    proxy = MyCustomProxy(prefill_deployment, decode_deployment, my_custom_arg=True)
    
    # More boilerplate...
    ingress = serve.deployment(OpenAiIngress).bind([proxy])
    return ingress

After: Configuration Only

# Just configure it!
app = build_pd_openai_app({
    "prefill_config": prefill,
    "decode_config": decode,
    "proxy_cls_config": {
        "proxy_cls": MyCustomProxy,
        "proxy_extra_kwargs": {"my_custom_arg": True}
    }
})

Benefits

  • No more custom builders for simple customizations
  • Consistent patterns across deployment types
  • Easy component swapping without code changes (e.g., via YAML configs)
  • Type-safe configuration with validation

Testing

Release tests: https://buildkite.com/ray-project/release/builds/63841

Future work

  • One thing we should do in a follow up PR is to make PDProxyServer inherit LLMServerProtocol than LLMServer. It will become confusing over time when developers look at PDProxyServer's implementation and use that as the example of how to implement the rest of the stuff.

- Removed the `cu121` build argument from `rayllm.depsets.yaml`.
- Updated the `nixl` package version to `0.6.1` in multiple lock files to ensure compatibility.
- Adjusted the Dockerfile to comment out unused build arguments related to `ROOT_DIR`, `GDR_HOME`, `UCX_HOME`, and `NIXL_HOME`.
- Cleaned up several lock files by deleting outdated versions for `cu121` and ensuring consistency across `cu128` and `cpu` configurations.

These changes aim to streamline the dependency management and improve the build process for the rayllm project.

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
@kouroshHakha kouroshHakha added the go add ONLY when ready to merge, run all tests label Oct 15, 2025
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
@kouroshHakha kouroshHakha marked this pull request as ready for review October 15, 2025 17:42
@kouroshHakha kouroshHakha requested review from a team as code owners October 15, 2025 17:42
@kouroshHakha
Copy link
Contributor Author

/gemini review

@kouroshHakha kouroshHakha changed the title [serve][llm] revamp pd APIs [serve][llm] Unify and Extend Builder Configuration for LLM Deployments Oct 15, 2025
cursor[bot]

This comment was marked as outdated.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and well-executed revamp of the Prefill/Decode (P/D) and standard LLM deployment APIs. The move to a unified, Pydantic-based configuration model is a major improvement, making the APIs more intuitive, type-safe, and extensible. The ability to use pluggable components for ingress and proxy via configuration is a fantastic feature that will greatly simplify customizations for users. The code is cleaner and more maintainable as a result of this refactoring.

My review includes a couple of points for consideration. One is a behavioral change regarding the kv_transfer_config in P/D deployments, which is now mandatory. The other is a minor regression in parsing LLM configurations from raw YAML strings.

Overall, this is an excellent contribution that significantly enhances the usability and flexibility of Ray Serve's LLM capabilities.

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
cursor[bot]

This comment was marked as outdated.

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Copy link
Contributor

@ruisearch42 ruisearch42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some nitpicks

Args:
pd_serving_args: The dictionary containing prefill and decode configs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update this as well?

pd_serving_args: A dict that conforms to the PDServingArgs pydantic model.

class ProxyClsConfig(BaseModelExtended):
proxy_cls: Union[str, type[PDProxyServer]] = Field(
default=PDProxyServer,
description="The class name of the proxy class.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The proxy class or class name

elif isinstance(value, LLMConfig):
return value
else:
raise ValueError(f"Invalid LLMConfig: {value}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: TypeError?

Comment on lines 97 to 105
@field_validator("prefill_config")
@classmethod
def _validate_prefill_config(cls, value: Any) -> LLMConfig:
return cls._validate_llm_config(value)

@field_validator("decode_config")
@classmethod
def _validate_decode_config(cls, value: Any) -> LLMConfig:
return cls._validate_llm_config(value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can do this?

    @field_validator("prefill_config", "decode_config")
    @classmethod
    def _validate_llm_configs(cls, value: Any) -> LLMConfig:
        return cls._validate_llm_config(value)

elif isinstance(config, LLMConfig):
llm_configs.append(config)
else:
raise ValueError(f"Invalid LLMConfig: {config}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TypeError?

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
@kouroshHakha
Copy link
Contributor Author

kouroshHakha commented Oct 16, 2025

release tests failing here: https://buildkite.com/ray-project/release/builds/63841

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
@kouroshHakha kouroshHakha enabled auto-merge (squash) October 16, 2025 04:47
@kouroshHakha kouroshHakha merged commit 4706bf1 into ray-project:master Oct 16, 2025
7 checks passed
edoakes pushed a commit to edoakes/ray that referenced this pull request Oct 16, 2025
…ts (ray-project#57724)

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
justinyeh1995 pushed a commit to justinyeh1995/ray that referenced this pull request Oct 20, 2025
…ts (ray-project#57724)

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
xinyuangui2 pushed a commit to xinyuangui2/ray that referenced this pull request Oct 22, 2025
…ts (ray-project#57724)

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: xgui <xgui@anyscale.com>
snorkelopsstgtesting1-spec pushed a commit to snorkel-marlin-repos/ray-project_ray_pr_57724_12d6ffa3-1a61-439d-9c7e-28b53dd3eea0 that referenced this pull request Oct 22, 2025
snorkelopstesting3-bot added a commit to snorkel-marlin-repos/ray-project_ray_pr_57724_12d6ffa3-1a61-439d-9c7e-28b53dd3eea0 that referenced this pull request Oct 22, 2025
…ation for LLM Deployments

Merged from original PR #57724
Original: ray-project/ray#57724
elliot-barn pushed a commit that referenced this pull request Oct 23, 2025
…ts (#57724)

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
…ts (ray-project#57724)

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
…ts (ray-project#57724)

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests llm serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants