-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-16134][SQL] optimizer rules for typed filter #13846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #61035 has finished for PR 13846 at commit
|
|
retest this please |
|
Test build #61083 has finished for PR 13846 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can also push down if TypedFilter( Filter( SerializeFromObject(child) ) ) into Filter( SerializeFromObject( TypedFilter(child) ) ).
e.g. ds.map(...).filter(byExpr).filter(byFunc).
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it's true, and Filter can be any other unary operators whose output is derived from its child, e.g. Sort.
However, I don't think ds.map(...).filter(byExpr).filter(byFunc) is a common case, i.e. mixing typed and untyped operations interlaced. If there is an easy and general way to optimize it, I'm happy to have it, or I'd like to leave it.
what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I don't think mixing typed and untyped is not a common case, but I don't have any idea to optimize easy and general way so I think we can leave it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we prepend rather than append found filters here? Otherwise filter predicates will be evaluated in reverse order after being combined. Also would be nice to comment about this.
|
Test build #61463 has finished for PR 13846 at commit
|
|
Test build #61466 has finished for PR 13846 at commit
|
## What changes were proposed in this pull request? This PR adds 3 optimizer rules for typed filter: 1. push typed filter down through `SerializeFromObject` and eliminate the deserialization in filter condition. 2. pull typed filter up through `SerializeFromObject` and eliminate the deserialization in filter condition. 3. combine adjacent typed filters and share the deserialized object among all the condition expressions. This PR also adds `TypedFilter` logical plan, to separate it from normal filter, so that the concept is more clear and it's easier to write optimizer rules. ## How was this patch tested? `TypedFilterOptimizationSuite` Author: Wenchen Fan <wenchen@databricks.com> Closes #13846 from cloud-fan/filter. (cherry picked from commit d063898) Signed-off-by: Cheng Lian <lian@databricks.com>
|
LGTM, merged to master. (Also merged to branch-2.0 by mistake, will revert it ASAP. Sorry for the trouble.) |
|
Reverted the commit on branch-2.0. |
What changes were proposed in this pull request?
This PR adds 3 optimizer rules for typed filter:
SerializeFromObjectand eliminate the deserialization in filter condition.SerializeFromObjectand eliminate the deserialization in filter condition.This PR also adds
TypedFilterlogical plan, to separate it from normal filter, so that the concept is more clear and it's easier to write optimizer rules.How was this patch tested?
TypedFilterOptimizationSuite