Skip to content

Commit 275a93d

Browse files
committed
Data LLM Config Refactor - Part 5: Update public API docstrings
Update docstrings in ray.data.llm to document nested stage configs and backward compatibility with legacy boolean flags. Changes: - Update vLLMEngineProcessorConfig docstring: - Replace legacy field docs (apply_chat_template, tokenize, etc.) with nested stage config fields (chat_template_stage, tokenize_stage, etc.) - Note that legacy fields are deprecated but still supported - Mention per-stage control over batch_size, concurrency, runtime_env - Update SGLangEngineProcessorConfig docstring: - Same updates as vLLM config - Update build_llm_processor docstring: - Mention nested stage config support in config parameter - Note backward compatibility with legacy flags Docstrings remain concise and focus on essential information for users. Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
1 parent 399ef58 commit 275a93d

File tree

1 file changed

+23
-14
lines changed

1 file changed

+23
-14
lines changed

python/ray/data/llm.py

Lines changed: 23 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -113,20 +113,24 @@ class vLLMEngineProcessorConfig(_vLLMEngineProcessorConfig):
113113
each batch. The default value may not be optimal when the batch size
114114
or the batch processing latency is too small, but it should be good
115115
enough for batch size >= 64.
116-
apply_chat_template: Whether to apply chat template.
117-
chat_template: The chat template to use. This is usually not needed if the
118-
model checkpoint already contains the chat template.
119-
tokenize: Whether to tokenize the input before passing it to the vLLM engine.
120-
If not, vLLM will tokenize the prompt in the engine.
121-
detokenize: Whether to detokenize the output.
122-
has_image: Whether the input messages have images.
116+
chat_template_stage: Chat templating stage config (bool | dict | ChatTemplateStageConfig).
117+
Defaults to True. Use nested config for per-stage control over batch_size,
118+
concurrency, and runtime_env. Legacy ``apply_chat_template`` and ``chat_template``
119+
fields are deprecated but still supported.
120+
tokenize_stage: Tokenizer stage config (bool | dict | TokenizerStageConfig).
121+
Defaults to True. Legacy ``tokenize`` field is deprecated but still supported.
122+
detokenize_stage: Detokenizer stage config (bool | dict | DetokenizeStageConfig).
123+
Defaults to True. Legacy ``detokenize`` field is deprecated but still supported.
124+
prepare_image_stage: Prepare image stage config (bool | dict | PrepareImageStageConfig).
125+
Defaults to False. Legacy ``has_image`` field is deprecated but still supported.
123126
accelerator_type: The accelerator type used by the LLM stage in a processor.
124127
Default to None, meaning that only the CPU will be used.
125128
concurrency: The number of workers for data parallelism. Default to 1.
126129
If ``concurrency`` is a tuple ``(m, n)``, Ray creates an autoscaling
127130
actor pool that scales between ``m`` and ``n`` workers (``1 <= m <= n``).
128131
If ``concurrency`` is an ``int`` ``n``, CPU stages use an autoscaling
129132
pool from ``(1, n)``, while GPU stages use a fixed pool of ``n`` workers.
133+
Stage-specific concurrency can be set via nested stage configs.
130134
131135
Examples:
132136
@@ -205,19 +209,21 @@ class SGLangEngineProcessorConfig(_SGLangEngineProcessorConfig):
205209
each batch. The default value may not be optimal when the batch size
206210
or the batch processing latency is too small, but it should be good
207211
enough for batch size >= 64.
208-
apply_chat_template: Whether to apply chat template.
209-
chat_template: The chat template to use. This is usually not needed if the
210-
model checkpoint already contains the chat template.
211-
tokenize: Whether to tokenize the input before passing it to the SGLang engine.
212-
If not, SGLang will tokenize the prompt in the engine.
213-
detokenize: Whether to detokenize the output.
212+
chat_template_stage: Chat templating stage config (bool | dict | ChatTemplateStageConfig).
213+
Defaults to True. Legacy ``apply_chat_template`` and ``chat_template``
214+
fields are deprecated but still supported.
215+
tokenize_stage: Tokenizer stage config (bool | dict | TokenizerStageConfig).
216+
Defaults to True. Legacy ``tokenize`` field is deprecated but still supported.
217+
detokenize_stage: Detokenizer stage config (bool | dict | DetokenizeStageConfig).
218+
Defaults to True. Legacy ``detokenize`` field is deprecated but still supported.
214219
accelerator_type: The accelerator type used by the LLM stage in a processor.
215220
Default to None, meaning that only the CPU will be used.
216221
concurrency: The number of workers for data parallelism. Default to 1.
217222
If ``concurrency`` is a tuple ``(m, n)``, Ray creates an autoscaling
218223
actor pool that scales between ``m`` and ``n`` workers (``1 <= m <= n``).
219224
If ``concurrency`` is an ``int`` ``n``, CPU stages use an autoscaling
220225
pool from ``(1, n)``, while GPU stages use a fixed pool of ``n`` workers.
226+
Stage-specific concurrency can be set via nested stage configs.
221227
222228
Examples:
223229
.. testcode::
@@ -375,7 +381,10 @@ def build_llm_processor(
375381
"""Build a LLM processor using the given config.
376382
377383
Args:
378-
config: The processor config.
384+
config: The processor config. Supports nested stage configs for per-stage
385+
control (e.g., ``chat_template_stage=ChatTemplateStageConfig(batch_size=128)``).
386+
Legacy boolean flags (``apply_chat_template``, ``tokenize``, etc.) are
387+
deprecated but still supported.
379388
preprocess: An optional lambda function that takes a row (dict) as input
380389
and returns a preprocessed row (dict). The output row must contain the
381390
required fields for the following processing stages. Each row

0 commit comments

Comments
 (0)