-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-35439][SQL][FOLLOWUP] ExpressionContainmentOrdering should not sort unrelated expressions #32870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| * will be considered as e1 < e2 and e2 < e1 by this ordering. But for the usage here, | ||
| * the order of irrelevant expressions does not matter. | ||
| */ | ||
| class ExpressionContainmentOrdering extends Ordering[Expression] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, is there a reason of this move?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, as it is a nested class, I cannot allocate it separately, but
val equivalence = new EquivalentExpressions
val exprOrdering = new equivalence.ExpressionContainmentOrderingI can revert to nested class if you think it's unnecessary change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind. New one also looks good~
| * we want the child expressions come first than parent expressions, so we can replace | ||
| * child expressions in parent expressions with subexpression evaluation. Note that | ||
| * this is not for general expression ordering. For example, two irrelevant expressions | ||
| * will be considered as e1 < e2 and e2 < e1 by this ordering. But for the usage here, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we change this to 0 according to the new logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, missing the doc. Fixed.
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Could you rebase to the master branch? The linter failure was fixed on the master branch. |
|
Rebased. Thanks! |
| * child expressions in parent expressions with subexpression evaluation. Note that | ||
| * this is not for general expression ordering. For example, two irrelevant expressions | ||
| * will be considered as equal by this ordering. But for the usage here, the order of | ||
| * irrelevant expressions does not matter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be complete, could you add some description about the semantically-equal expressions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Added.
| // `x` is child expression of `y`. | ||
| -1 | ||
| } else { | ||
| // Irrelevant expressions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto. We should mention the semantically-equal expression here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added. thanks.
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM (only one minor comment, https://github.com/apache/spark/pull/32870/files#r649667440)
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Test build #139661 has finished for PR 32870 at commit
|
|
Kubernetes integration test starting |
|
Test build #139683 has finished for PR 32870 at commit
|
|
Kubernetes integration test status success |
|
Test build #139665 has finished for PR 32870 at commit
|
|
Thank you. Merged to master. |
|
Test build #139673 has finished for PR 32870 at commit
|
|
I think this could theoretically still cause some issues because it doesn't follow the last rule for comparators:
A simple example I found that doesn't sort correctly: The result remains |
|
I noticed that, but currently I have not better idea to sort the expressions better. For irrelevant expressions, seems no good rule to order them in deterministic way. Right now I just can make it meet transitivity contract so it can avoid the exception. As mentioned in its doc, this is not for general expression ordering but just for the specific usage. I think it seems to be rare to produce suboptimal sort. I'll think if there is better way to sort it. |
|
In my fork I just changed it to so it basically just sorts by the number of expressions in the tree (not sure if there's an easier way to get that count than how I did it). Haven't done exhaustive testing on it but I feel like that makes sense to do. Not sure how one expression could contain another if it doesn't have more total expressions |
What changes were proposed in this pull request?
This is a followup of #32586. We introduced
ExpressionContainmentOrderingto sort common expressions according to their parent-child relations. For unrelated expressions, previously the ordering returns -1 which is not correct and can possibly lead to transitivity issue.Why are the changes needed?
To fix the possible transitivity issue of
ExpressionContainmentOrdering.Does this PR introduce any user-facing change?
No
How was this patch tested?
Unit test.