@@ -113,20 +113,24 @@ class vLLMEngineProcessorConfig(_vLLMEngineProcessorConfig):
113113 each batch. The default value may not be optimal when the batch size
114114 or the batch processing latency is too small, but it should be good
115115 enough for batch size >= 64.
116- apply_chat_template: Whether to apply chat template.
117- chat_template: The chat template to use. This is usually not needed if the
118- model checkpoint already contains the chat template.
119- tokenize: Whether to tokenize the input before passing it to the vLLM engine.
120- If not, vLLM will tokenize the prompt in the engine.
121- detokenize: Whether to detokenize the output.
122- has_image: Whether the input messages have images.
116+ chat_template_stage: Chat templating stage config (bool | dict | ChatTemplateStageConfig).
117+ Defaults to True. Use nested config for per-stage control over batch_size,
118+ concurrency, and runtime_env. Legacy ``apply_chat_template`` and ``chat_template``
119+ fields are deprecated but still supported.
120+ tokenize_stage: Tokenizer stage config (bool | dict | TokenizerStageConfig).
121+ Defaults to True. Legacy ``tokenize`` field is deprecated but still supported.
122+ detokenize_stage: Detokenizer stage config (bool | dict | DetokenizeStageConfig).
123+ Defaults to True. Legacy ``detokenize`` field is deprecated but still supported.
124+ prepare_image_stage: Prepare image stage config (bool | dict | PrepareImageStageConfig).
125+ Defaults to False. Legacy ``has_image`` field is deprecated but still supported.
123126 accelerator_type: The accelerator type used by the LLM stage in a processor.
124127 Default to None, meaning that only the CPU will be used.
125128 concurrency: The number of workers for data parallelism. Default to 1.
126129 If ``concurrency`` is a tuple ``(m, n)``, Ray creates an autoscaling
127130 actor pool that scales between ``m`` and ``n`` workers (``1 <= m <= n``).
128131 If ``concurrency`` is an ``int`` ``n``, CPU stages use an autoscaling
129132 pool from ``(1, n)``, while GPU stages use a fixed pool of ``n`` workers.
133+ Stage-specific concurrency can be set via nested stage configs.
130134
131135 Examples:
132136
@@ -205,19 +209,21 @@ class SGLangEngineProcessorConfig(_SGLangEngineProcessorConfig):
205209 each batch. The default value may not be optimal when the batch size
206210 or the batch processing latency is too small, but it should be good
207211 enough for batch size >= 64.
208- apply_chat_template: Whether to apply chat template.
209- chat_template: The chat template to use. This is usually not needed if the
210- model checkpoint already contains the chat template.
211- tokenize: Whether to tokenize the input before passing it to the SGLang engine.
212- If not, SGLang will tokenize the prompt in the engine.
213- detokenize: Whether to detokenize the output.
212+ chat_template_stage: Chat templating stage config (bool | dict | ChatTemplateStageConfig).
213+ Defaults to True. Legacy ``apply_chat_template`` and ``chat_template``
214+ fields are deprecated but still supported.
215+ tokenize_stage: Tokenizer stage config (bool | dict | TokenizerStageConfig).
216+ Defaults to True. Legacy ``tokenize`` field is deprecated but still supported.
217+ detokenize_stage: Detokenizer stage config (bool | dict | DetokenizeStageConfig).
218+ Defaults to True. Legacy ``detokenize`` field is deprecated but still supported.
214219 accelerator_type: The accelerator type used by the LLM stage in a processor.
215220 Default to None, meaning that only the CPU will be used.
216221 concurrency: The number of workers for data parallelism. Default to 1.
217222 If ``concurrency`` is a tuple ``(m, n)``, Ray creates an autoscaling
218223 actor pool that scales between ``m`` and ``n`` workers (``1 <= m <= n``).
219224 If ``concurrency`` is an ``int`` ``n``, CPU stages use an autoscaling
220225 pool from ``(1, n)``, while GPU stages use a fixed pool of ``n`` workers.
226+ Stage-specific concurrency can be set via nested stage configs.
221227
222228 Examples:
223229 .. testcode::
@@ -375,7 +381,10 @@ def build_llm_processor(
375381 """Build a LLM processor using the given config.
376382
377383 Args:
378- config: The processor config.
384+ config: The processor config. Supports nested stage configs for per-stage
385+ control (e.g., ``chat_template_stage=ChatTemplateStageConfig(batch_size=128)``).
386+ Legacy boolean flags (``apply_chat_template``, ``tokenize``, etc.) are
387+ deprecated but still supported.
379388 preprocess: An optional lambda function that takes a row (dict) as input
380389 and returns a preprocessed row (dict). The output row must contain the
381390 required fields for the following processing stages. Each row
0 commit comments