-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-24983][Catalyst] Add configuration for maximum number of leaf expressions in collapsed projects #21993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@HyukjinKwon can you help with finding reviewers for this PR? |
|
Usually @gatorsmile and @cloud-fan. |
|
Let us blacklist CASE WHEN in CollapseProject, instead of introducing this new conf. |
|
@dvogelbacher Currently, in the master branch (2.4 release), you have a workaround. Add CollapseProject to |
|
@gatorsmile yes, I found that workaround. Very useful :) Not sure if blacklisting case-when statements outright is the right way to go. That could have negative perf impacts as well? And it wouldn't handle the case in the unit test where we have exponential growth when adding/subtracting columns (though that example might be somewhat contrived). Maybe we should just not collapse if the number of leaf expressions for changed expressions in the collapsed project is higher than the sum of the number of corresponding leaf expressions in the original project lists? |
| } else { | ||
| p2.copy(projectList = buildCleanedProjectList(p1.projectList, p2.projectList)) | ||
| } | ||
| case p1@Project(_, p2: Project) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Do we need to change this line? We can keep this line as is.
|
Backlisting case-when statements looks good to me. |
|
Can one of the admins verify this patch? |
|
Closing due to inactivity over a year. |
What changes were proposed in this pull request?
Add a configuration option for the maximum number of leaf expressions in collapsed project nodes. If a collapsed project node (result of the
CollapseProjectoptimizer rule) would have more leaf expressions than the configured maximum number we don't collapse. This is to protect against an exponentially exploding number of leaf expressions when collapsing many (binary) expression that refer to the same columns (see https://issues.apache.org/jira/browse/SPARK-24983).How was this patch tested?
Add a new unit test.