Skip to content

Conversation

@j-esse
Copy link

@j-esse j-esse commented Jan 29, 2019

https://issues.apache.org/jira/browse/SPARK-26626
apache#23556

What changes were proposed in this pull request?

This adds a spark.sql.maxRepeatedAliasSize config option, which specifies the maximum size of an aliased expression to be substituted (in CollapseProject and PhysicalOperation). This prevents large aliased expressions from being substituted multiple times and exploding the size of the expression tree, eventually OOMing the driver.

The default config value of 100 was chosen through testing to find the optimally performant value:

image

How was this patch tested?

Added unit tests, and did manual testing

@vinooganesh
Copy link

vinooganesh commented Jan 29, 2019

@j-esse will approve, but can you add this to FORK.md as well?

@j-esse
Copy link
Author

j-esse commented Jan 30, 2019

@vinooganesh done!

@bulldozer-bot bulldozer-bot bot merged commit a51fa9c into master Jan 30, 2019
@bulldozer-bot bulldozer-bot bot deleted the feature/cap-alias-substitution-palantir branch January 30, 2019 21:24
@robert3005
Copy link

For future - please follow same PR title as upstream

@j-esse
Copy link
Author

j-esse commented Feb 1, 2019

@robert3005 ah sorry!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants