Data LLM Config Refactor - Part 5: Update public API docstrings

nrghosh · nrghosh · commit 275a93dd23c0 · 2025-11-12T11:29:41.000-08:00
Update docstrings in ray.data.llm to document nested stage configs and
backward compatibility with legacy boolean flags.

Changes:
- Update vLLMEngineProcessorConfig docstring:
  - Replace legacy field docs (apply_chat_template, tokenize, etc.) with
    nested stage config fields (chat_template_stage, tokenize_stage, etc.)
  - Note that legacy fields are deprecated but still supported
  - Mention per-stage control over batch_size, concurrency, runtime_env
- Update SGLangEngineProcessorConfig docstring:
  - Same updates as vLLM config
- Update build_llm_processor docstring:
  - Mention nested stage config support in config parameter
  - Note backward compatibility with legacy flags

Docstrings remain concise and focus on essential information for users.

Signed-off-by: Nikhil Ghosh &lt;nikhil@anyscale.com&gt;
diff --git a/python/ray/data/llm.py b/python/ray/data/llm.py
@@ -113,20 +113,24 @@ class vLLMEngineProcessorConfig(_vLLMEngineProcessorConfig):
             each batch. The default value may not be optimal when the batch size
             or the batch processing latency is too small, but it should be good
             enough for batch size >= 64.
-        apply_chat_template: Whether to apply chat template.
-        chat_template: The chat template to use. This is usually not needed if the
-            model checkpoint already contains the chat template.
-        tokenize: Whether to tokenize the input before passing it to the vLLM engine.
-            If not, vLLM will tokenize the prompt in the engine.
-        detokenize: Whether to detokenize the output.
-        has_image: Whether the input messages have images.
+        chat_template_stage: Chat templating stage config (bool | dict | ChatTemplateStageConfig).
+            Defaults to True. Use nested config for per-stage control over batch_size,
+            concurrency, and runtime_env. Legacy ``apply_chat_template`` and ``chat_template``
+            fields are deprecated but still supported.
+        tokenize_stage: Tokenizer stage config (bool | dict | TokenizerStageConfig).
+            Defaults to True. Legacy ``tokenize`` field is deprecated but still supported.
+        detokenize_stage: Detokenizer stage config (bool | dict | DetokenizeStageConfig).
+            Defaults to True. Legacy ``detokenize`` field is deprecated but still supported.
+        prepare_image_stage: Prepare image stage config (bool | dict | PrepareImageStageConfig).
+            Defaults to False. Legacy ``has_image`` field is deprecated but still supported.
         accelerator_type: The accelerator type used by the LLM stage in a processor.
             Default to None, meaning that only the CPU will be used.
         concurrency: The number of workers for data parallelism. Default to 1.
             If ``concurrency`` is a tuple ``(m, n)``, Ray creates an autoscaling
             actor pool that scales between ``m`` and ``n`` workers (``1 <= m <= n``).
             If ``concurrency`` is an ``int`` ``n``, CPU stages use an autoscaling
             pool from ``(1, n)``, while GPU stages use a fixed pool of ``n`` workers.
+            Stage-specific concurrency can be set via nested stage configs.
 
     Examples:
 
@@ -205,19 +209,21 @@ class SGLangEngineProcessorConfig(_SGLangEngineProcessorConfig):
             each batch. The default value may not be optimal when the batch size
             or the batch processing latency is too small, but it should be good
             enough for batch size >= 64.
-        apply_chat_template: Whether to apply chat template.
-        chat_template: The chat template to use. This is usually not needed if the
-            model checkpoint already contains the chat template.
-        tokenize: Whether to tokenize the input before passing it to the SGLang engine.
-            If not, SGLang will tokenize the prompt in the engine.
-        detokenize: Whether to detokenize the output.
+        chat_template_stage: Chat templating stage config (bool | dict | ChatTemplateStageConfig).
+            Defaults to True. Legacy ``apply_chat_template`` and ``chat_template``
+            fields are deprecated but still supported.
+        tokenize_stage: Tokenizer stage config (bool | dict | TokenizerStageConfig).
+            Defaults to True. Legacy ``tokenize`` field is deprecated but still supported.
+        detokenize_stage: Detokenizer stage config (bool | dict | DetokenizeStageConfig).
+            Defaults to True. Legacy ``detokenize`` field is deprecated but still supported.
         accelerator_type: The accelerator type used by the LLM stage in a processor.
             Default to None, meaning that only the CPU will be used.
         concurrency: The number of workers for data parallelism. Default to 1.
             If ``concurrency`` is a tuple ``(m, n)``, Ray creates an autoscaling
             actor pool that scales between ``m`` and ``n`` workers (``1 <= m <= n``).
             If ``concurrency`` is an ``int`` ``n``, CPU stages use an autoscaling
             pool from ``(1, n)``, while GPU stages use a fixed pool of ``n`` workers.
+            Stage-specific concurrency can be set via nested stage configs.
 
     Examples:
         .. testcode::
@@ -375,7 +381,10 @@ def build_llm_processor(
     """Build a LLM processor using the given config.
 
     Args:
-        config: The processor config.
+        config: The processor config. Supports nested stage configs for per-stage
+            control (e.g., ``chat_template_stage=ChatTemplateStageConfig(batch_size=128)``).
+            Legacy boolean flags (``apply_chat_template``, ``tokenize``, etc.) are
+            deprecated but still supported.
         preprocess: An optional lambda function that takes a row (dict) as input
             and returns a preprocessed row (dict). The output row must contain the
             required fields for the following processing stages. Each row