-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pr/9813 experimentation #1
Pr/9813 experimentation #1
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @mustafasrepo and @suremarc -- I am a little confused about what is going on here
@@ -167,6 +168,11 @@ impl PhysicalExpr for CastExpr { | |||
fn get_ordering(&self, children: &[SortProperties]) -> SortProperties { | |||
children[0] | |||
} | |||
|
|||
/// If it is known that [`CastExpr`] have an ordering. Its input should have same ordering. | |||
fn get_children_ordering(&self, ordering: SortOptions) -> Vec<SortProperties> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to add an implementation for function calls that used the declared monotonicity of the function too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an example for this use case.
@@ -226,6 +227,15 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug + PartialEq<dyn Any> { | |||
fn get_ordering(&self, _children: &[SortProperties]) -> SortProperties { | |||
SortProperties::Unordered | |||
} | |||
|
|||
/// The order information of the children expressions based the ordering of the expression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a little confused about the need for this API -- why isn't the calculation of orderings from children enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this implementation is the reverse of the deduction from children orderings. The problem in the original issue is that, when table has ordering: [a ASC, b ASC, c ASC]
. Assume we apply following predicate a=round(c)
in the filter. In this case, after filtering we know that a
and round(c)
would be same from now on. This enables us to deduce round(c)
is itself ordered. Which in turn, enables us to deduce c
would be ordered.
In this case constraint (e.g ordering of the expression round(c)
) comes from the analysis. This newly introduced API enables us to deduce given that output ordering, its children (e.g c
) would have corresponding ordering.
With this analysis ORDER BY c ASC
wouldn't add new SortExec
to the plan. Since it is known that after filter, this requirement is already satisfied. Unfortunately I couldn't come up with solution without introducing a new API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem correct to me to say that round(c)
being ordered implies c
is ordered. For runs that have the same value for round(c)
, the ordering of c
can be arbitrary. For example, if the data is ordered by c
and we project round(c)
, followed by a SortPreservingMergeStream
on round(c)
, we have no guarantees that after merging the data will still be ordered by c
, even though it was originally.
It seems only true if the data is sorted by [round(c), c
], as you need c
to be the tiebreaker. But for [round(c), b, c]
it will not be true that the output is ordered by c
unless you know b
is constant when round(c)
is. So it seems to me one needs to take into account that round(c)
and c
are next to each other in the sort order.
@@ -167,6 +168,11 @@ impl PhysicalExpr for CastExpr { | |||
fn get_ordering(&self, children: &[SortProperties]) -> SortProperties { | |||
children[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if a cast really preserves the ordering? I would have thought that casting numbers to strings would change the ordering for example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually you are right, we should add a check something like whether cast is between similar types
]; | ||
// If round(c) is a monotonic expression of c, then `order by c` is satisfied given `round(c)=a` (where `a` is leading ordering). | ||
for (expr, (expected_expr, expected_options)) in test_exprs { | ||
properties.add_equal_conditions(&expr, &col("a", schema.as_ref())?); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to do properties.clone()
here? Otherwise the test cases might affect one another since we're mutating the properties.
@@ -226,6 +227,15 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug + PartialEq<dyn Any> { | |||
fn get_ordering(&self, _children: &[SortProperties]) -> SortProperties { | |||
SortProperties::Unordered | |||
} | |||
|
|||
/// The order information of the children expressions based the ordering of the expression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem correct to me to say that round(c)
being ordered implies c
is ordered. For runs that have the same value for round(c)
, the ordering of c
can be arbitrary. For example, if the data is ordered by c
and we project round(c)
, followed by a SortPreservingMergeStream
on round(c)
, we have no guarantees that after merging the data will still be ordered by c
, even though it was originally.
It seems only true if the data is sorted by [round(c), c
], as you need c
to be the tiebreaker. But for [round(c), b, c]
it will not be true that the output is ordered by c
unless you know b
is constant when round(c)
is. So it seems to me one needs to take into account that round(c)
and c
are next to each other in the sort order.
You are right. I think, with this finding it better to move on with your original PR9813 |
…aTypes) (apache#8985) * ScalarValue return types from argument values * change file name * try using ?Sized * use Ok * move method default impl outside trait * Use type trait for ExprSchemable * fix nit * Proposed Return Type from Expr suggestions (#1) * Improve return_type_from_args * Rework example * Update datafusion/core/tests/user_defined/user_defined_scalar_functions.rs --------- Co-authored-by: Junhao Liu <junhaoliu2023@gmail.com> * Apply suggestions from code review Co-authored-by: Alex Huang <huangweijun1001@gmail.com> * Fix tests + clippy * rework types to use dyn trait * fmt * docs * Apply suggestions from code review Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com> * Add docs explaining what happens when both `return_type` and `return_type_from_exprs` are called * clippy * fix doc -- comedy of errors --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Alex Huang <huangweijun1001@gmail.com> Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
* refactor `TreeNode::rewrite()` * use handle_tree_recursion in `Expr` * use macro for transform recursions * fix api * minor fixes * fix * don't trust `t.transformed` coming from transformation closures, keep the old way of detecting if changes were made * rephrase todo comment, always propagate up `t.transformed` from the transformation closure, fix projection pushdown closure * Fix `TreeNodeRecursion` docs * extend Skip (Prune) functionality to Jump as it is defined in https://synnada.notion.site/synnada/TreeNode-Design-Proposal-bceac27d18504a2085145550e267c4c1 * fix Jump and add tests * jump test fixes * fix clippy * unify "transform" traversals using macros, fix "visit" traversal jumps, add visit jump tests, ensure consistent naming `f` instead of `op`, `f_down` instead of `pre_visit` and `f_up` instead of `post_visit` * fix macro rewrite * minor fixes * minor fix * refactor tests * add transform tests * add apply, transform_down and transform_up tests * refactor tests * test jump on both a and e nodes in both top-down and bottom-up traversals * better transform/rewrite tests * minor fix * simplify tests * add stop tests, reorganize tests * fix previous merges and remove leftover file * Review TreeNode Refactor (#1) * Minor changes * Jump doesn't ignore f_up * update test * Update rewriter * LogicalPlan visit update and propagate from children flags * Update tree_node.rs * Update map_children's --------- Co-authored-by: Mustafa Akur <mustafa.akur@synnada.ai> * fix * minor fixes * fix f_up call when f_down returns jump * simplify code * minor fix * revert unnecessary changes * fix `DynTreeNode` and `ConcreteTreeNode` `transformed` and `tnr` propagation * introduce TransformedResult helper * fix docs * restore transform as alias to trassform_up * restore transform as alias to trassform_up 2 * Simplifications and comment improvements (#2) --------- Co-authored-by: Berkay Şahin <124376117+berkaysynnada@users.noreply.github.com> Co-authored-by: Mustafa Akur <mustafa.akur@synnada.ai> Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
Thanks @suremarc for collaboration, closing this PR. |
Which issue does this PR close?
Closes #.
Rationale for this change
@suremarc I reviewed your PR9812. I devised an alternative solution, which I think more general and support multi children. If you are Ok with these changes, we can continue with this solution in the upstream.
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?