-
Notifications
You must be signed in to change notification settings - Fork 7k
[data][llm] Add per-stage map kwargs for build_llm_processor preprocess/postprocess #57826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[data][llm] Add per-stage map kwargs for build_llm_processor preprocess/postprocess #57826
Conversation
…ss/postprocess Addresses ray-project#57812. Enable users to control resources and concurrency of preprocess/postprocess stages independently from the main LLM stage by passing Dataset.map() kwargs. - Add preprocess_map_kwargs and postprocess_map_kwargs parameters to build_llm_processor() and all builder functions - Update Processor class to store and apply map kwargs to dataset.map() calls - Add validation for map kwargs with warnings on unknown keys - Unit tests + Update docstrings Users can now provision fractional CPU resources and tune parameters per stage without workarounds- improving utilization. Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
| @classmethod | ||
| def validate_map_kwargs(cls, map_kwargs: Optional[Dict[str, Any]]) -> None: | ||
| """Validate map kwargs contain only supported Dataset.map parameters. | ||
| Args: | ||
| map_kwargs: Optional kwargs to pass to Dataset.map(). | ||
| Note: | ||
| Unknown keys will trigger a warning as they'll be passed as ray_remote_args. | ||
| """ | ||
| if map_kwargs is None: | ||
| return | ||
|
|
||
| # Supported Dataset.map parameters | ||
| supported_keys = { | ||
| "compute", | ||
| "fn_args", | ||
| "fn_kwargs", | ||
| "fn_constructor_args", | ||
| "fn_constructor_kwargs", | ||
| "num_cpus", | ||
| "num_gpus", | ||
| "memory", | ||
| "concurrency", | ||
| "ray_remote_args_fn", | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this list is dynamic and i don't think it's easy to maintain this one. We should just leave it up to the user to read the docs and provide the right map kwargs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, resolved
|
@nrghosh tests are failing. |
dc0b178 to
ca429d4
Compare
- parameters change over time, don't enforce statically - remove validation from llm.py::build_llm_processor Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
ca429d4 to
21b95e4
Compare
…ss/postprocess (ray-project#57826) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
…ss/postprocess (ray-project#57826) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com> Signed-off-by: xgui <xgui@anyscale.com>
…ss/postprocess (#57826) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
…ss/postprocess (ray-project#57826) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
…ss/postprocess (ray-project#57826) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>
…ss/postprocess (ray-project#57826) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>
Description
Enable users to control resources and concurrency of preprocess/postprocess stages independently from the main LLM stage by passing Dataset.map() kwargs.
Users can now provision fractional CPU resources and tune parameters per stage without workarounds- improving utilization.
Related issues
Addresses #57812
Additional information
Example usage:
instead of