-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Datasets] Refactor map-like functions into planner package #32021
Conversation
Signed-off-by: Cheng Su <scnju13@gmail.com>
fn_args: Optional[Iterable[Any]] = None, | ||
fn_kwargs: Optional[Dict[str, Any]] = None, | ||
fn_constructor_args: Optional[Iterable[Any]] = None, | ||
fn_constructor_kwargs: Optional[Dict[str, Any]] = None, | ||
target_block_size: Optional[int] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just reordering arguments, to make all fn
-related arguments to be put together, and look more coherent.
from ray.data.context import DatasetContext | ||
|
||
|
||
def generate_filter_fn() -> Callable[[Iterator[Block]], Iterator[Block]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copied from Dataset.filter()
from ray.data.context import DatasetContext | ||
|
||
|
||
def generate_flat_map_fn() -> Callable[[Iterator[Block]], Iterator[Block]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copied from Dataset.flat_map()
from typing_extensions import Literal | ||
|
||
|
||
def generate_map_batches_fn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copied from Dataset.map_batches()
from ray.data.context import DatasetContext | ||
|
||
|
||
def generate_map_rows_fn() -> Callable[[Iterator[Block]], Iterator[Block]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copied from Dataset.map()
Signed-off-by: Cheng Su <scnju13@gmail.com> Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Signed-off-by: Cheng Su scnju13@gmail.com
Why are these changes needed?
This PR is followup of #31977 (comment), to remove
block_fn
from logical operator, and make planner to generate functions formap_batches/map/flat_map/filter
. The change includesblock_fn
fromAbstactMap
, and all subclasses.transform
functions defined inDataset.map_batches/map/flat_map/filter
intoPlanner.generate_xxx_fn
(e.g.generate_map_batches_fn
)Decide to doing refactoring from
Dataset
here, because we update those code paths very frequently (e.g.Dataset.map_batches()
), so don't want to maintain two diverged code paths. Also different fromplan.py
andstage_impl.py
, we won't deletedataset.py
.Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.