-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Datasets] [Operator Fusion - 1/2] Add operator fusion to new execution planner. #32095
[Datasets] [Operator Fusion - 1/2] Add operator fusion to new execution planner. #32095
Conversation
|
||
# Build a map logical operator to be used as a reference for further fusion. | ||
# TODO(Clark): This is hacky, remove this once we push fusion to be purely based | ||
# on a lower-level operator spec. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we need information both at the LogicalOperator
and PhysicalOperator
level to perform this fusion, not sure what else we can do here other than:
- Push the remaining requisite information into the physical
MapOperator
(constructor args, lifttarget_block_size
out of the bundler, etc.). - Push the block transform function into the logical
AbstractMap
operator and do the fusion at the logical operator level.
Each have their complications/cons, e.g. for (1) we'd need to clear the constructor args at op.start()
in order to keep us from needlessly hanging on to object references, and for (2) we'd be muddying the logical layer with an execution-level concept (block transformations).
@@ -18,7 +18,7 @@ def generate_map_batches_fn( | |||
batch_format: Literal["default", "pandas", "pyarrow", "numpy"] = "default", | |||
prefetch_batches: int = 0, | |||
zero_copy_batch: bool = False, | |||
) -> Callable[[Iterator[Block]], Iterator[Block]]: | |||
) -> Callable[[Iterator[Block], BatchUDF], Iterator[Block]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@c21 This returned callable is technically def transform(blocks: Iterator[Block], fn: BatchUDF, *args, **kwargs) -> Iterator[Block])
, thinking about updating this and others with a typing.Protocol
to capture this. https://docs.python.org/3/library/typing.html#protocols
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I noticed it today, let's do a followup PR to fix these types. Seems to be minor.
Could we introduce a logical fused node (with multiple logical nodes as children) that would be generated by the logical fusion rule? Then, the planner just needs to know how to generate code for this type of node. |
# not the other way around. The latter (downstream op) will be used as the | ||
# compute if fused. | ||
if ( | ||
is_task_compute(down_logical_op._compute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we check isinstance(down_logical_op, AbstractMap)
before is_task_compute(down_logical_op._compute)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I think that's a good defensive check to add. We already have an isinstance(down_op, MapOperator)
check above, which limits the logical op to Read()
or AbstractMap()
, and you currently can't have a Read()
as the downstream op (it's always a source op), but that's a lot of assumptions. And there's a good chance that we'll introduce more logical operations that will result in a MapOperator
physical operator, so that's a good defensive check for guarding against future failures.
Actually, how about we add an upfront logical op check like this right after fetching the logical ops?
if not isinstance(down_logical_op, AbstractMap) or not isinstance(up_logical_op, (Read, AbstractMap)):
return False
Then we can assume that constraint for the rest of the function and tweak that condition as we add more logical ops that result in physical MapOperator
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah SGTM.
if isinstance(up_logical_op, AbstractMap) | ||
else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should check down_logical_op
is AbstractMap
before accessing down_logical_op._target_block_size
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, see my above comment about allow-listing downstream and upstream logical ops!
input_op = up_logical_op | ||
fn = down_logical_op._fn | ||
fn_args = down_logical_op._fn_args | ||
fn_kwargs = down_logical_op._fn_kwargs | ||
fn_constructor_args = down_logical_op._fn_constructor_args | ||
fn_constructor_kwargs = down_logical_op._fn_constructor_kwargs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my understanding, when this case can happen? Is it for Read
? And why we set all fn
-related args from down_logical_op
, instead of from up_logical_op
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@c21 Yep it's for when the upstream logical op is a Read
in which case it won't have any of those fn
-related args, but I also just realized that this isn't quite right: we should be adopting the downstream op UDF and associated args, not the upstream op, and we should only be taking the upstream op input dependency as the input op (where we bottom-out at the source/read op, which isn't quite correct but should be fine for providing info to physical optimization rules).
@@ -18,7 +18,7 @@ def generate_map_batches_fn( | |||
batch_format: Literal["default", "pandas", "pyarrow", "numpy"] = "default", | |||
prefetch_batches: int = 0, | |||
zero_copy_batch: bool = False, | |||
) -> Callable[[Iterator[Block]], Iterator[Block]]: | |||
) -> Callable[[Iterator[Block], BatchUDF], Iterator[Block]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I noticed it today, let's do a followup PR to fix these types. Seems to be minor.
|
||
from ray.data.block import Block | ||
from ray.data._internal.compute import is_task_compute, CallableClass, get_compute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we have to depend on ray.data._internal.compute
here. Let's add a TODO to refactor those needed methods out of ray.data._internal.compute
. So we don't take a dependency on it in the future - i.e. we plan to delete compute.py
eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep I decided to accept this dependency for now and thought that we could remove this dependency when we delete compute.py
, otherwise we start to accumulate repeated definitions and run the risk of drift. Some of these like CallableClass
should be moved to a user-facing interfaces file (since the type is user-facing), while is_task_compute()
and get_compute()
could probably be moved into this module, but I held off on that for now to try to keep this PR size from getting too large.
"""Convert logical to physical operators recursively in post-order.""" | ||
physical_dag = self._plan(logical_plan.dag) | ||
return PhysicalPlan(physical_dag, self._physical_op_to_logical_op) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's make a defensive copy of self._physical_op_to_logical_op
, before passing to PhysicalPlan
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this map is specifically generated for this PhysicalPlan
, it shouldn't be used anywhere else, and we are defensively copying in the optimization rule, which I believe is the correct place? https://github.com/ray-project/ray/blob/9afbdcaffdcfa560f08bf6938d9f9fa80d711b44/python/ray/data/_internal/logical/optimizers.py#L53
@ericl - I think about it and don't think it can work well. E.g. currently a Another example is aggregate, currently we would have a sort-based aggregate physical operator (as @clarkzinzow will soon add a PR to fuse batch functions together, and the current approach (as physical rule) plays well with the As an evidence from other system, Spark is doing operators fusion (combine multiple physical operators into a code-gen operator) after planning, at physical optimization phase. Historically Spark does not have a formal |
9afbdca
to
a6dc8ca
Compare
return self._dag | ||
|
||
|
||
class PhysicalPlan(Plan): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file path is "logical/interfaces.py" but we are now introducing physical pieces. Shall we name the path as optimizer? It'll be consistent of the 3 components of query processing (planner, optimizer, execution).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine for the renaming if others have no objection. But wanted to make sure we are addressing renaming in a separate PR, for easier review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already a "planner/" directory, why the "Plan" and related are not belong to "planner/"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's fine to put Plan/LogicalPlan/PhysicalPlan
here, as Rule
depends on Plan
, and Optimizer
depends on Rule
. In the future, we may generalize some graph traversal logic into Plan
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @clarkzinzow!
down_transform_fn = down_op.get_transformation_fn() | ||
up_transform_fn = up_op.get_transformation_fn() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's assert isinstance(down_transform_fn, MapTransformFn) and isinstance(up_transform_fn, MapTransformFn)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python doesn't support isinstance
checks with subscripted generics.
708f47b
to
39fd63f
Compare
Failures are unrelated (tensor extension break in master), merging! |
…on planner. (ray-project#32095) This PR adds operation fusion to the new execution planner. Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
This PR adds operation fusion to the new execution planner. A further optimization for zero-copy batching between the block transforms of fused operators will be added in a follow-up PR (this is a new optimization, not status quo).
In an effort to not expand the
PhysicalOperator
API and to not further complicate the physicalMapOperator
while still expressing this optimization as a physical plan optimization (as I believe it should be), this PR introduces aPhysicalPlan
that the execution layer is ignorant of that holds auxiliary data that enables physical plan optimization such as stage fusion (in this case, aPhysicalOperator -> LogicalOperator
map).Long-term, we should probably have one last planning layer consistent of stateless "physical operators" that we can perform these last-mile optimizations on without muddying the operator execution API (with the typical "what" vs. "how" distinction).
Related issue number
Closes #31893
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.