[data][llm] Ray Data LLM Config Refactor #58298

nrghosh · 2025-10-30T01:00:56Z

Refactor data.llm processor configs to support nested stage configuration

Summary

Refactors Ray Data LLM processor configuration from flat boolean flags (apply_chat_template, tokenize, detokenize, has_image) to nested, typed stage configs. This enables per-stage control over batch_size, concurrency, runtime_env, num_cpus, and memory while maintaining backward compatibility with legacy boolean flags.

What's Changing

Before: All stages (chat template, tokenization, engine, detokenization) inherit the same processor-level defaults (batch_size, concurrency, runtime_env). Configuration is done via flat boolean flags with no per-stage customization.

After: Each stage can be configured independently via nested StageConfig objects. Processor-level defaults still apply, but stages can override them individually. Legacy boolean flags continue to work (with deprecation warnings).

How It Works

Stage Config Models: New Pydantic models (ChatTemplateStageConfig, TokenizerStageConfig, etc.) define per-stage settings with optional overrides.
Flexible Input: Processor configs accept stage configs in three forms:
- bool: tokenize_stage=True → enabled with processor defaults
- dict: tokenize_stage={"batch_size": 128} → enabled with overrides
- StageConfig: tokenize_stage=TokenizerStageConfig(batch_size=128) → typed config
Resolution & Merging: The resolve_stage_config() function:
- Converts bool/dict/StageConfig → typed StageConfig instance
- Merges processor-level defaults (batch_size, concurrency, runtime_env, model_source) into stage config
- Stage-specific overrides take precedence over processor defaults
- Creates a copy to prevent mutation when reusing configs across processors
Backward Compatibility: A root_validator automatically coerces legacy boolean flags (apply_chat_template, tokenize, etc.) into the new nested format and emits deprecation warnings.

Problem

Current config uses flat boolean flags (apply_chat_template, tokenize, detokenize, has_image) with processor-level defaults. This prevents:

Per-stage resource tuning (e.g., different batch_size for tokenization vs engine)
Clear ownership of stage-specific parameters
Extensibility for new stages without modifying shared config classes
Alignment with Ray Serve's nested Pydantic config pattern

Solution

Introduce typed StageConfig models per stage, extend OfflineProcessorConfig to accept nested configs (bool | dict | StageConfig), and update builders to resolve and merge stage configs with processor defaults.

Architecture Diagram

BEFORE (Flat Config):
┌─────────────────────────────────────────┐
│ vLLMEngineProcessorConfig              │
│ ├─ apply_chat_template: bool          │
│ ├─ tokenize: bool                     │
│ ├─ detokenize: bool                   │
│ ├─ batch_size: int (shared)           │
│ └─ concurrency: int (shared)           │
└─────────────────────────────────────────┘
              │
              ▼
    All stages inherit same values

AFTER (Nested Config):
┌─────────────────────────────────────────┐
│ vLLMEngineProcessorConfig              │
│ ├─ chat_template_stage:                │
│ │   ├─ enabled: bool                   │
│ │   ├─ batch_size: Optional[int]       │
│ │   ├─ concurrency: Optional[int]      │
│ │   └─ chat_template: Optional[str]    │
│ ├─ tokenize_stage:                     │
│ │   ├─ enabled: bool                   │
│ │   ├─ batch_size: Optional[int]       │
│ │   └─ concurrency: Optional[int]      │
│ ├─ batch_size: int (processor default)│
│ └─ concurrency: int (processor default)│
└─────────────────────────────────────────┘
              │
              ▼
    resolve_stage_config() merges:
    stage override OR processor default

Changes

1. New StageConfig Models (`stages/configs.py`)

ChatTemplateStageConfig, TokenizerStageConfig, DetokenizeStageConfig, PrepareImageStageConfig
Base class with enabled, batch_size, concurrency, runtime_env
resolve_stage_config() function converts bool|dict|StageConfig → typed config with merged defaults

2. Extended Processor Config (`processor/base.py`)

Add nested fields: chat_template_stage, tokenize_stage, detokenize_stage, prepare_image_stage
root_validator coerces legacy booleans → stage configs
Emit deprecation warnings when legacy fields used

3. Builder Updates (`processor/vllm_engine_proc.py`, `processor/sglang_engine_proc.py`)

Use resolve_stage_config() for all stages
Merge stage-specific overrides with processor defaults
Normalize concurrency (int → tuple) per stage

Migration

Legacy code (still works, emits warnings):

config = vLLMEngineProcessorConfig(
    model_source="...",
    apply_chat_template=True,  # Deprecated
    tokenize=True,              # Deprecated
)

New code (nested configs):

config = vLLMEngineProcessorConfig(
    model_source="...",
    chat_template_stage=ChatTemplateStageConfig(batch_size=128),
    tokenize_stage={"enabled": True, "concurrency": 2},
)

Benefits

Per-stage control: Tune batch_size, concurrency, runtime_env independently per stage
Type safety: Pydantic validation catches config errors early
YAML-friendly: Nested configs serialize cleanly
Backward compatible: Legacy flags work with deprecation warnings
Extensible: Easy to add new stages without modifying processor config

Implementation Status

Stage 1: StageConfig scaffolding + OfflineProcessorConfig extension
Stage 2: Resolver function + vLLM builder updates
Stage 3: SGLang processor updates
Stage 4: Deprecation warnings
Stage 5: Docstring updates
Stage 6: Tests

Stage config resolver and merging - resolver function - update vLLM builder to merge stage configs with processor defaults Changes: - Add resolve_stage_config() function in stages/configs.py to convert bool|dict|StageConfig -> typed StageConfig with processor defaults merged - Update build_vllm_engine_processor() to use resolver for all stages: - PrepareImageStage, ChatTemplateStage, TokenizeStage, DetokenizeStage - Each stage now respects per-stage overrides for: - batch_size: stage-specific override, falls back to processor default - concurrency: stage-specific override (normalized int -> tuple), falls back - runtime_env: stage-specific override, falls back to processor default - model: stage-specific model override for tokenizer/chat template stages - Keep backward compatibility This unlocks per-stage resource tuning while preserving the processor-first UX/API Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

… config resolver Apply resolver and merge logic to SGLang processor for parity with vLLM. Changes: - Update build_sglang_engine_processor() to use resolve_stage_config() for: - ChatTemplateStage, TokenizeStage, DetokenizeStage - Each stage respects per-stage overrides (batch_size, concurrency, runtime_env, model) - Maintains backward compatibility with legacy boolean flags - Consistent behavior with vLLM processor Note: ServeDeployment and HttpRequest processors don't use OfflineProcessorConfig and only have single stages, so they don't require stage config refactoring. Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

…y boolean flags Emit deprecation warnings when legacy boolean flags are used, guiding users to migrate to nested stage configs. Changes: - Update _coerce_legacy_to_stage_config() root_validator in OfflineProcessorConfig to emit logger.warning() when legacy fields are detected: - `apply_chat_template` / `chat_template` -> `chat_template_stage` - `tokenize` -> `tokenize_stage` - `detokenize` -> `detokenize_stage` - `has_image` -> `prepare_image_stage` - Warnings include examples showing how to use the new nested config API - Warnings only emitted when legacy fields are explicitly set (not on defaults) - Maintains backward compatibility - legacy flags still work This provides clear migration guidance while preserving existing functionality. Users will see helpful warnings pointing them to the new API without breaking their code. Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

Update docstrings in ray.data.llm to document nested stage configs and backward compatibility with legacy boolean flags. Changes: - Update vLLMEngineProcessorConfig docstring: - Replace legacy field docs (apply_chat_template, tokenize, etc.) with nested stage config fields (chat_template_stage, tokenize_stage, etc.) - Note that legacy fields are deprecated but still supported - Mention per-stage control over batch_size, concurrency, runtime_env - Update SGLangEngineProcessorConfig docstring: - Same updates as vLLM config - Update build_llm_processor docstring: - Mention nested stage config support in config parameter - Note backward compatibility with legacy flags Docstrings remain concise and focus on essential information for users. Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

python/ray/llm/_internal/batch/stages/configs.py

python/ray/llm/_internal/batch/processor/vllm_engine_proc.py

python/ray/llm/_internal/batch/processor/sglang_engine_proc.py

cursor · 2025-11-12T20:53:26Z

python/ray/llm/_internal/batch/processor/sglang_engine_proc.py

-                    chat_template=config.chat_template,
+                    model=chat_template_stage_cfg.model or config.model_source,
+                    chat_template=chat_template_stage_cfg.chat_template
+                    or config.chat_template,


Bug: Chat Template Kwargs Silently Ignored

The builder ignores chat_template_stage_cfg.chat_template_kwargs and only uses the chat_template_kwargs parameter passed to the builder function. When users configure chat_template_stage with chat_template_kwargs, those settings are silently ignored, causing the stage to use incorrect or missing template kwargs.

nrghosh · 2025-11-12T20:57:04Z

/gemini review

gemini-code-assist

Code Review

This pull request is a great refactoring of the LLM processor configuration, moving from flat boolean flags to nested, typed stage configs. This significantly improves per-stage configurability and aligns with modern configuration patterns. The backward compatibility is handled well with deprecation warnings.

My review focuses on a few areas to further improve the implementation:

Improving the resolve_stage_config helper to be more complete.
Reducing code duplication in the processor builder functions for better maintainability.

gemini-code-assist · 2025-11-12T20:59:02Z

python/ray/llm/_internal/batch/processor/sglang_engine_proc.py

+    # Resolve and build ChatTemplateStage if enabled
+    chat_template_stage_cfg = resolve_stage_config(
+        getattr(config, "chat_template_stage", config.apply_chat_template),
+        ChatTemplateStageConfig,
+        processor_defaults,
+    )
+    if chat_template_stage_cfg.enabled:
+        # Use stage-specific concurrency if set, otherwise processor default
+        stage_concurrency = (
+            chat_template_stage_cfg.concurrency
+            if chat_template_stage_cfg.concurrency is not None
+            else config.get_concurrency()
+        )
+        # Normalize concurrency to tuple if needed
+        if isinstance(stage_concurrency, int):
+            stage_concurrency = (stage_concurrency, stage_concurrency)
+
        stages.append(
            ChatTemplateStage(
                fn_constructor_kwargs=dict(
-                    model=config.model_source,
-                    chat_template=config.chat_template,
+                    model=chat_template_stage_cfg.model or config.model_source,
+                    chat_template=chat_template_stage_cfg.chat_template
+                    or config.chat_template,
                    chat_template_kwargs=chat_template_kwargs,
                ),
                map_batches_kwargs=dict(
                    zero_copy_batch=True,
-                    concurrency=config.get_concurrency(),
-                    batch_size=config.batch_size,
-                    runtime_env=config.runtime_env,
+                    concurrency=stage_concurrency,
+                    batch_size=chat_template_stage_cfg.batch_size or config.batch_size,
+                    runtime_env=chat_template_stage_cfg.runtime_env
+                    or config.runtime_env,
                ),
            )
        )


Similar to vllm_engine_proc.py, there is a lot of repeated code for building each stage (ChatTemplateStage, TokenizeStage, DetokenizeStage). The logic for resolving configuration, handling concurrency, and creating the stage object is duplicated.

To improve code maintainability, consider refactoring this repeated logic into a helper function. This would make the build_sglang_engine_processor function more concise and easier to manage.

gemini-code-assist · 2025-11-12T20:59:03Z

python/ray/llm/_internal/batch/processor/vllm_engine_proc.py

+    # Resolve and build PrepareImageStage if enabled
+    image_stage_cfg = resolve_stage_config(
+        getattr(config, "prepare_image_stage", config.has_image),
+        PrepareImageStageConfig,
+        processor_defaults,
+    )
+    if image_stage_cfg.enabled:
+        # Use stage-specific concurrency if set, otherwise processor default
+        stage_concurrency = (
+            image_stage_cfg.concurrency
+            if image_stage_cfg.concurrency is not None
+            else config.get_concurrency()
+        )
+        # Normalize concurrency to tuple if needed
+        if isinstance(stage_concurrency, int):
+            stage_concurrency = (stage_concurrency, stage_concurrency)
+
        stages.append(
            PrepareImageStage(
                map_batches_kwargs=dict(
                    zero_copy_batch=True,
-                    concurrency=config.get_concurrency(),
-                    batch_size=config.batch_size,
+                    concurrency=stage_concurrency,
+                    batch_size=image_stage_cfg.batch_size or config.batch_size,
+                    runtime_env=image_stage_cfg.runtime_env or config.runtime_env,
                ),
            )
        )


There's a significant amount of duplicated code for building each stage (PrepareImageStage, ChatTemplateStage, TokenizeStage, DetokenizeStage). The logic for resolving the stage config, determining concurrency, and constructing the stage is nearly identical for each.

To improve maintainability and reduce redundancy, you could extract this logic into a helper function. This would make the build_vllm_engine_processor function much cleaner and easier to read and would centralize the stage creation logic.

gemini-code-assist · 2025-11-12T20:59:03Z

python/ray/llm/_internal/batch/stages/configs.py

+    if resolved.batch_size is None and "batch_size" in processor_defaults:
+        resolved.batch_size = processor_defaults["batch_size"]
+    if resolved.runtime_env is None and "runtime_env" in processor_defaults:
+        resolved.runtime_env = processor_defaults["runtime_env"]


The resolve_stage_config function is very helpful for merging processor-level defaults into stage-specific configurations. I noticed that it handles batch_size and runtime_env, but concurrency is missing. The docstring for processor_defaults even mentions concurrency as an expected key.

To make this function more complete and to reduce repetitive code in the processor builders, consider also merging the concurrency default here.

This would require updating the processor_defaults dictionary in vllm_engine_proc.py and sglang_engine_proc.py to include concurrency, and would allow simplifying the concurrency handling logic in those files.

Suggested change

if resolved.batch_size is None and "batch_size" in processor_defaults:

resolved.batch_size = processor_defaults["batch_size"]

if resolved.runtime_env is None and "runtime_env" in processor_defaults:

resolved.runtime_env = processor_defaults["runtime_env"]

if resolved.batch_size is None and "batch_size" in processor_defaults:

resolved.batch_size = processor_defaults["batch_size"]

if resolved.concurrency is None and "concurrency" in processor_defaults:

resolved.concurrency = processor_defaults["concurrency"]

if resolved.runtime_env is None and "runtime_env" in processor_defaults:

resolved.runtime_env = processor_defaults["runtime_env"]

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

python/ray/llm/_internal/batch/processor/sglang_engine_proc.py

python/ray/llm/_internal/batch/processor/vllm_engine_proc.py

…tage autoscaling Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

python/ray/llm/_internal/batch/processor/sglang_engine_proc.py

python/ray/llm/_internal/batch/processor/vllm_engine_proc.py

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

…source control Expose num_cpus and memory as optional fields in _StageConfigBase to enable per-stage resource control (ray remote args). These fields are extracted from stage configs and passed to map_batches_kwargs for all CPU stages. Example usage: ``` config = vLLMEngineProcessorConfig( model_id="...", tokenize_stage=TokenizerStageConfig( num_cpus=2.0, # Per-stage CPU control memory=1000000, # Per-stage memory control ), chat_template_stage=ChatTemplateStageConfig( num_cpus=1.0, # Different resources for different stages ), ) ``` Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

python/ray/llm/_internal/batch/processor/sglang_engine_proc.py

cursor · 2025-11-13T00:48:41Z

python/ray/llm/_internal/batch/stages/configs.py

+    memory: Optional[float] = Field(
+        default=None,
+        description="Heap memory in bytes to reserve for each map worker in this stage.",
+    )


Bug: Stage Config: Missing Validation, Hidden Errors

The _StageConfigBase class lacks validation for concurrency and batch_size fields. Users can pass invalid values like negative integers or invalid tuple ranges (e.g., {"concurrency": -1} or {"concurrency": (5, 2)}), which bypass validation and cause cryptic errors later when Ray Data attempts to use them. The processor-level validate_concurrency validator exists but doesn't apply to stage-specific configs.

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

nrghosh · 2025-11-13T01:02:57Z

/gemini review

gemini-code-assist

Code Review

This pull request is a well-executed refactoring of the LLM processor configuration, moving from flat boolean flags to nested, typed stage configs. This significantly improves per-stage control and extensibility while maintaining backward compatibility. My review focuses on a few areas where the implementation can be made more robust and maintainable by reducing code duplication and handling edge cases in configuration resolution more explicitly.

gemini-code-assist · 2025-11-13T01:04:58Z

python/ray/llm/_internal/batch/stages/configs.py

+    else:
+        # Fallback: create enabled=True config
+        resolved = stage_config_cls(enabled=True)


The fallback else block in this function can lead to unexpected behavior. If stage_cfg_value is None or another unexpected type, it is silently converted to a stage config with enabled=True. This is likely not the user's intent, especially if they explicitly pass None to disable a stage.

For robustness, it would be better to raise a TypeError for unsupported types. If None is a supported value, it should probably be handled explicitly to mean enabled=False.

Suggested change

else:

# Fallback: create enabled=True config

resolved = stage_config_cls(enabled=True)

else:

raise TypeError(

f"Unsupported type for stage config: {type(stage_cfg_value)}. "

"Expected bool, dict, or a _StageConfigBase object."

)

python/ray/llm/_internal/batch/processor/base.py

gemini-code-assist · 2025-11-13T01:04:58Z

python/ray/llm/_internal/batch/processor/sglang_engine_proc.py

+    # Resolve and build ChatTemplateStage if enabled
+    chat_template_stage_cfg = resolve_stage_config(
+        getattr(config, "chat_template_stage", config.apply_chat_template),
+        ChatTemplateStageConfig,
+        processor_defaults,
+    )
+    if chat_template_stage_cfg.enabled:
+        # Use stage-specific concurrency if set, otherwise processor default
+        stage_concurrency = (
+            chat_template_stage_cfg.concurrency
+            if chat_template_stage_cfg.concurrency is not None
+            else config.get_concurrency()
+        )
+        # Normalize concurrency to tuple if needed
+        # CPU stages use autoscaling (1, n) for int concurrency
+        if isinstance(stage_concurrency, int):
+            stage_concurrency = (1, stage_concurrency)
+
        stages.append(
            ChatTemplateStage(
                fn_constructor_kwargs=dict(
-                    model=config.model_source,
-                    chat_template=config.chat_template,
-                    chat_template_kwargs=chat_template_kwargs,
+                    model=chat_template_stage_cfg.model
+                    if chat_template_stage_cfg.model is not None
+                    else config.model_source,
+                    chat_template=chat_template_stage_cfg.chat_template
+                    if chat_template_stage_cfg.chat_template is not None
+                    else config.chat_template,
+                    chat_template_kwargs=chat_template_stage_cfg.chat_template_kwargs
+                    if chat_template_stage_cfg.chat_template_kwargs is not None
+                    else chat_template_kwargs,
                ),
                map_batches_kwargs=dict(
                    zero_copy_batch=True,
-                    concurrency=config.get_concurrency(),
-                    batch_size=config.batch_size,
-                    runtime_env=config.runtime_env,
+                    concurrency=stage_concurrency,
+                    batch_size=chat_template_stage_cfg.batch_size
+                    if chat_template_stage_cfg.batch_size is not None
+                    else config.batch_size,
+                    **{
+                        k: v
+                        for k, v in {
+                            "runtime_env": chat_template_stage_cfg.runtime_env,
+                            "num_cpus": chat_template_stage_cfg.num_cpus,
+                            "memory": chat_template_stage_cfg.memory,
+                        }.items()
+                        if v is not None
+                    },
                ),
            )
        )

-    if config.tokenize:
+    # Resolve and build TokenizeStage if enabled
+    tokenize_stage_cfg = resolve_stage_config(
+        getattr(config, "tokenize_stage", config.tokenize),
+        TokenizerStageConfig,
+        processor_defaults,
+    )
+    if tokenize_stage_cfg.enabled:
+        # Use stage-specific concurrency if set, otherwise processor default
+        stage_concurrency = (
+            tokenize_stage_cfg.concurrency
+            if tokenize_stage_cfg.concurrency is not None
+            else config.get_concurrency()
+        )
+        # Normalize concurrency to tuple if needed
+        # CPU stages use autoscaling (1, n) for int concurrency
+        if isinstance(stage_concurrency, int):
+            stage_concurrency = (1, stage_concurrency)
+
        stages.append(
            TokenizeStage(
                fn_constructor_kwargs=dict(
-                    model=config.model_source,
+                    model=tokenize_stage_cfg.model
+                    if tokenize_stage_cfg.model is not None
+                    else config.model_source,
                ),
                map_batches_kwargs=dict(
                    zero_copy_batch=True,
-                    concurrency=config.get_concurrency(),
-                    batch_size=config.batch_size,
-                    runtime_env=config.runtime_env,
+                    concurrency=stage_concurrency,
+                    batch_size=tokenize_stage_cfg.batch_size
+                    if tokenize_stage_cfg.batch_size is not None
+                    else config.batch_size,
+                    **{
+                        k: v
+                        for k, v in {
+                            "runtime_env": tokenize_stage_cfg.runtime_env,
+                            "num_cpus": tokenize_stage_cfg.num_cpus,
+                            "memory": tokenize_stage_cfg.memory,
+                        }.items()
+                        if v is not None
+                    },
                ),
            )
        )


There is significant code duplication in building the different stages (ChatTemplateStage, TokenizeStage). The logic for resolving configuration, determining concurrency, and constructing map_batches_kwargs is repeated for each stage. This makes the code hard to read and maintain.

This repeated logic could be extracted into a helper function.

Additionally, there are several redundant checks. For example:

batch_size=chat_template_stage_cfg.batch_size if chat_template_stage_cfg.batch_size is not None else config.batch_size

The resolve_stage_config function already merges the processor-level defaults, so chat_template_stage_cfg.batch_size should not be None at this point. These checks can be simplified (e.g., to batch_size=chat_template_stage_cfg.batch_size), which would make the code cleaner. This applies to concurrency and model parameters as well.

gemini-code-assist · 2025-11-13T01:04:58Z

python/ray/llm/_internal/batch/processor/vllm_engine_proc.py

+    # Resolve and build PrepareImageStage if enabled
+    image_stage_cfg = resolve_stage_config(
+        getattr(config, "prepare_image_stage", config.has_image),
+        PrepareImageStageConfig,
+        processor_defaults,
+    )
+    if image_stage_cfg.enabled:
+        # Use stage-specific concurrency if set, otherwise processor default
+        stage_concurrency = (
+            image_stage_cfg.concurrency
+            if image_stage_cfg.concurrency is not None
+            else config.get_concurrency()
+        )
+        # Normalize concurrency to tuple if needed
+        # CPU stages use autoscaling (1, n) for int concurrency
+        if isinstance(stage_concurrency, int):
+            stage_concurrency = (1, stage_concurrency)
+
        stages.append(
            PrepareImageStage(
                map_batches_kwargs=dict(
                    zero_copy_batch=True,
-                    concurrency=config.get_concurrency(),
-                    batch_size=config.batch_size,
+                    concurrency=stage_concurrency,
+                    batch_size=image_stage_cfg.batch_size
+                    if image_stage_cfg.batch_size is not None
+                    else config.batch_size,
+                    **{
+                        k: v
+                        for k, v in {
+                            "runtime_env": image_stage_cfg.runtime_env,
+                            "num_cpus": image_stage_cfg.num_cpus,
+                            "memory": image_stage_cfg.memory,
+                        }.items()
+                        if v is not None
+                    },
                ),
            )
        )
-    if config.apply_chat_template:
+
+    # Resolve and build ChatTemplateStage if enabled
+    chat_template_stage_cfg = resolve_stage_config(
+        getattr(config, "chat_template_stage", config.apply_chat_template),
+        ChatTemplateStageConfig,
+        processor_defaults,
+    )
+    if chat_template_stage_cfg.enabled:
+        # Use stage-specific concurrency if set, otherwise processor default
+        stage_concurrency = (
+            chat_template_stage_cfg.concurrency
+            if chat_template_stage_cfg.concurrency is not None
+            else config.get_concurrency()
+        )
+        # Normalize concurrency to tuple if needed
+        # CPU stages use autoscaling (1, n) for int concurrency
+        if isinstance(stage_concurrency, int):
+            stage_concurrency = (1, stage_concurrency)
+
        stages.append(
            ChatTemplateStage(
                fn_constructor_kwargs=dict(
-                    model=config.model_source,
-                    chat_template=config.chat_template,
-                    chat_template_kwargs=chat_template_kwargs,
+                    model=chat_template_stage_cfg.model
+                    if chat_template_stage_cfg.model is not None
+                    else config.model_source,
+                    chat_template=chat_template_stage_cfg.chat_template
+                    if chat_template_stage_cfg.chat_template is not None
+                    else config.chat_template,
+                    chat_template_kwargs=chat_template_stage_cfg.chat_template_kwargs
+                    if chat_template_stage_cfg.chat_template_kwargs is not None
+                    else chat_template_kwargs,
                ),
                map_batches_kwargs=dict(
                    zero_copy_batch=True,
-                    concurrency=config.get_concurrency(),
-                    batch_size=config.batch_size,
-                    runtime_env=config.runtime_env,
+                    concurrency=stage_concurrency,
+                    batch_size=chat_template_stage_cfg.batch_size
+                    if chat_template_stage_cfg.batch_size is not None
+                    else config.batch_size,
+                    **{
+                        k: v
+                        for k, v in {
+                            "runtime_env": chat_template_stage_cfg.runtime_env,
+                            "num_cpus": chat_template_stage_cfg.num_cpus,
+                            "memory": chat_template_stage_cfg.memory,
+                        }.items()
+                        if v is not None
+                    },
                ),
            )
        )

-    if config.tokenize:
+    # Resolve and build TokenizeStage if enabled
+    tokenize_stage_cfg = resolve_stage_config(
+        getattr(config, "tokenize_stage", config.tokenize),
+        TokenizerStageConfig,
+        processor_defaults,
+    )
+    if tokenize_stage_cfg.enabled:
+        # Use stage-specific concurrency if set, otherwise processor default
+        stage_concurrency = (
+            tokenize_stage_cfg.concurrency
+            if tokenize_stage_cfg.concurrency is not None
+            else config.get_concurrency()
+        )
+        # Normalize concurrency to tuple if needed
+        # CPU stages use autoscaling (1, n) for int concurrency
+        if isinstance(stage_concurrency, int):
+            stage_concurrency = (1, stage_concurrency)
+
        stages.append(
            TokenizeStage(
                fn_constructor_kwargs=dict(
-                    model=config.model_source,
+                    model=tokenize_stage_cfg.model
+                    if tokenize_stage_cfg.model is not None
+                    else config.model_source,
                ),
                map_batches_kwargs=dict(
                    zero_copy_batch=True,
-                    concurrency=config.get_concurrency(),
-                    batch_size=config.batch_size,
-                    runtime_env=config.runtime_env,
+                    concurrency=stage_concurrency,
+                    batch_size=tokenize_stage_cfg.batch_size
+                    if tokenize_stage_cfg.batch_size is not None
+                    else config.batch_size,
+                    **{
+                        k: v
+                        for k, v in {
+                            "runtime_env": tokenize_stage_cfg.runtime_env,
+                            "num_cpus": tokenize_stage_cfg.num_cpus,
+                            "memory": tokenize_stage_cfg.memory,
+                        }.items()
+                        if v is not None
+                    },
                ),
            )
        )


There is significant code duplication in building the different stages (PrepareImageStage, ChatTemplateStage, TokenizeStage). The logic for resolving configuration, determining concurrency, and constructing map_batches_kwargs is repeated for each stage. This makes the code hard to read and maintain.

This repeated logic could be extracted into a helper function.

Additionally, there are several redundant checks. For example:

batch_size=image_stage_cfg.batch_size if image_stage_cfg.batch_size is not None else config.batch_size

The resolve_stage_config function already merges the processor-level defaults, so image_stage_cfg.batch_size should not be None at this point. These checks can be simplified (e.g., to batch_size=image_stage_cfg.batch_size), which would make the code cleaner. This applies to concurrency and model parameters as well.

…ported types Replace silent fallback that converted unexpected types to enabled=True with explicit TypeError. This prevents bugs where None or invalid types are silently treated as enabled stages. Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

gemini-code-assist

Code Review

This pull request introduces a significant and well-executed refactoring of the LLM processor configurations. By moving from flat boolean flags to nested, typed StageConfig objects, it provides much-needed per-stage control over resources and improves type safety and extensibility. The implementation maintains backward compatibility through a root_validator, which is a great approach for a smooth transition.

My review focuses on a small area for simplification in the processor builder functions, where the backward-compatibility logic appears to be duplicated. Overall, the changes are excellent and significantly improve the configuration API for Ray Data LLMs.

python/ray/llm/_internal/batch/processor/sglang_engine_proc.py

python/ray/llm/_internal/batch/processor/vllm_engine_proc.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Nikhil G <nrghosh@users.noreply.github.com>

…solution and backward compatibility Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

python/ray/llm/_internal/batch/processor/utils.py

normalize_cpu_stage_concurrency can receive None when stage_cfg.concurrency is None (e.g., when resolve_stage_config doesn't merge defaults). Previously returned None, violating Tuple[int, int] return type contract. Fix: explicitly handle None by defaulting to (1, 1) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

cursor · 2025-11-18T01:37:50Z

python/ray/llm/_internal/batch/processor/base.py

+            stage: Dict[str, Any] = {"enabled": enabled}
+            if values.get("chat_template") is not None:
+                stage["chat_template"] = values["chat_template"]
+            values["chat_template_stage"] = stage


Bug: Stage Field Type Inconsistency

The chat_template_stage field is unconditionally coerced to a dict when not explicitly provided, while other stage fields (tokenize_stage, detokenize_stage, prepare_image_stage) are only coerced when their legacy counterparts are provided. This creates inconsistent behavior: when no stage config is provided, chat_template_stage becomes a dict {"enabled": True} but tokenize_stage remains a boolean True (from the Field default). This inconsistency could confuse users who access these fields directly and expect uniform types.

jeffreyjeffreywang · 2025-11-18T05:03:53Z

Shall we migrate existing tests (e.g. test_vllm_engine_proc.py) to adopt the new schema or do you think we can leave them as is until we begin raising errors for the legacy schema?

kouroshHakha · 2025-11-18T06:20:31Z

Shall we migrate existing tests (e.g. test_vllm_engine_proc.py) to adopt the new schema or do you think we can leave them as is until we begin raising errors for the legacy schema?

+1

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

nrghosh · 2025-11-18T16:48:22Z

Shall we migrate existing tests (e.g. test_vllm_engine_proc.py) to adopt the new schema or do you think we can leave them as is until we begin raising errors for the legacy schema?

Yes, updated - whether / when we want to deprecate the old schema fully is a roadmap decision cc @richardliaw

python/ray/llm/_internal/batch/processor/base.py

- Fix: Conditional chat_template_stage construction in legacy coercion - The logic unconditionally set `chat_template_stage` to a default dict even when no legacy fields were present. This bypassed the Field default. - Now it only constructs the stage config if legacy fields are actually detected. Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com> Signed-off-by: Nikhil G <nrghosh@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com> Signed-off-by: Nikhil G <nrghosh@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com> Signed-off-by: Nikhil G <nrghosh@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: YK <1811651+ykdojo@users.noreply.github.com>

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com> Signed-off-by: Nikhil G <nrghosh@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Data LLM Config Refactor - Part 1

7458b28

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

nrghosh mentioned this pull request Oct 30, 2025

[docs][data.llm] simplify / add ray data.llm quickstart example #58330

Merged

nrghosh added 2 commits November 3, 2025 17:23

This was referenced Nov 10, 2025

[data][llm] Introduce generic multimodal preparation #58260

Open

[misc][data.llm] Generalize the builder pattern in ray.data.llm #58484

Open

nrghosh force-pushed the nrghosh/data-llm-config-refactor branch from 2c3086a to e97df77 Compare November 12, 2025 19:07

nrghosh force-pushed the nrghosh/data-llm-config-refactor branch from e97df77 to 275a93d Compare November 12, 2025 19:29

nrghosh marked this pull request as ready for review November 12, 2025 20:49

nrghosh requested a review from a team as a code owner November 12, 2025 20:49

cursor bot reviewed Nov 12, 2025

View reviewed changes

nrghosh requested review from a team and richardliaw November 12, 2025 20:55

gemini-code-assist bot reviewed Nov 12, 2025

View reviewed changes

wip - feedback: Fix config mutation and chat_template_kwargs bugs

5299043

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

cursor bot reviewed Nov 12, 2025

View reviewed changes

python/ray/llm/_internal/batch/processor/sglang_engine_proc.py Outdated Show resolved Hide resolved

python/ray/llm/_internal/batch/processor/vllm_engine_proc.py Outdated Show resolved Hide resolved

wip - feedback: Add concurrency to resolve_stage_config and fix CPU s…

2cf40d4

…tage autoscaling Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

cursor bot reviewed Nov 12, 2025

View reviewed changes

python/ray/llm/_internal/batch/processor/sglang_engine_proc.py Outdated Show resolved Hide resolved

python/ray/llm/_internal/batch/processor/vllm_engine_proc.py Outdated Show resolved Hide resolved

nrghosh added 2 commits November 12, 2025 16:23

wip - feedback: Fix runtime_env and batch_size falsy value handling

163f51f

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

cursor bot reviewed Nov 13, 2025

View reviewed changes

python/ray/llm/_internal/batch/processor/sglang_engine_proc.py Outdated Show resolved Hide resolved

cursor bot reviewed Nov 13, 2025

View reviewed changes

wip - feedback: fix or operators to checks

42dcaf0

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

gemini-code-assist bot reviewed Nov 13, 2025

View reviewed changes

ray-gardener bot added the data Ray Data-related issues label Nov 13, 2025

gemini-code-assist bot reviewed Nov 14, 2025

View reviewed changes

python/ray/llm/_internal/batch/processor/sglang_engine_proc.py Outdated Show resolved Hide resolved

python/ray/llm/_internal/batch/processor/vllm_engine_proc.py Outdated Show resolved Hide resolved

Merge branch 'master' into nrghosh/data-llm-config-refactor

245f521

richardliaw mentioned this pull request Nov 15, 2025

Ray Data Q4 Roadmap + Wishlist #58665

Open

nrghosh and others added 4 commits November 17, 2025 16:44

Update python/ray/llm/_internal/batch/processor/sglang_engine_proc.py

12bffb3

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Nikhil G <nrghosh@users.noreply.github.com>

Update python/ray/llm/_internal/batch/processor/vllm_engine_proc.py

7c8d97f

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Nikhil G <nrghosh@users.noreply.github.com>

Merge branch 'master' into nrghosh/data-llm-config-refactor

66fee73

Data LLM Config Refactor - Part 6: Add unit tests for stage config re…

474f119

…solution and backward compatibility Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

cursor bot reviewed Nov 18, 2025

View reviewed changes

python/ray/llm/_internal/batch/processor/utils.py Show resolved Hide resolved

nrghosh changed the title ~~[wip] [data.llm] Ray Data LLM Config Refactor~~ [data.llm] Ray Data LLM Config Refactor Nov 18, 2025

cursor bot reviewed Nov 18, 2025

View reviewed changes

kouroshHakha approved these changes Nov 18, 2025

View reviewed changes

wip - update tests to use nested StageConfig schema

4e35742

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

cursor bot reviewed Nov 18, 2025

View reviewed changes

python/ray/llm/_internal/batch/processor/base.py Outdated Show resolved Hide resolved

nrghosh added the go add ONLY when ready to merge, run all tests label Nov 19, 2025

nrghosh added 2 commits November 18, 2025 17:16

fix test

c200da2

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

wip - test

cac411a

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

nrghosh force-pushed the nrghosh/data-llm-config-refactor branch from c3ff09f to cac411a Compare November 19, 2025 03:53

nrghosh mentioned this pull request Nov 19, 2025

[serve][llm] Fix serve build YAML serialization for AggregationFunction enum #58509

Merged

kouroshHakha approved these changes Nov 19, 2025

View reviewed changes

kouroshHakha changed the title ~~[data.llm] Ray Data LLM Config Refactor~~ [data][llm] Ray Data LLM Config Refactor Nov 19, 2025

kouroshHakha merged commit 367c7fe into ray-project:master Nov 19, 2025
6 checks passed

-    else:
-        # Fallback: create enabled=True config
-        resolved = stage_config_cls(enabled=True)
+    else:
+        raise TypeError(
+            f"Unsupported type for stage config: {type(stage_cfg_value)}. "
+            "Expected bool, dict, or a _StageConfigBase object."
+        )

[data][llm] Ray Data LLM Config Refactor #58298

[data][llm] Ray Data LLM Config Refactor #58298

Uh oh!

Conversation

nrghosh commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Refactor data.llm processor configs to support nested stage configuration

Summary

What's Changing

How It Works

Problem

Solution

Architecture Diagram

Changes

1. New StageConfig Models (stages/configs.py)

2. Extended Processor Config (processor/base.py)

3. Builder Updates (processor/vllm_engine_proc.py, processor/sglang_engine_proc.py)

Migration

Benefits

Implementation Status

Related

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Nov 12, 2025

Choose a reason for hiding this comment

Bug: Chat Template Kwargs Silently Ignored

Uh oh!

nrghosh commented Nov 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Nov 13, 2025

Choose a reason for hiding this comment

Bug: Stage Config: Missing Validation, Hidden Errors

Uh oh!

nrghosh commented Nov 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

kouroshHakha Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Nov 18, 2025

nrghosh commented Oct 30, 2025 •

edited

Loading

1. New StageConfig Models (`stages/configs.py`)

2. Extended Processor Config (`processor/base.py`)

3. Builder Updates (`processor/vllm_engine_proc.py`, `processor/sglang_engine_proc.py`)