Skip to content

Conversation

@j-esse
Copy link

@j-esse j-esse commented Jan 24, 2019

https://issues.apache.org/jira/browse/SPARK-26626
apache#23556

What changes were proposed in this pull request?

This adds a spark.sql.maxRepeatedAliasSize config option, which specifies the maximum size of an aliased expression to be substituted (in CollapseProject and PhysicalOperation). This prevents large aliased expressions from being substituted multiple times and exploding the size of the expression tree, eventually OOMing the driver.

The default config value of 100 was chosen through testing to find the optimally performant value:

image

How was this patch tested?

Added unit tests, and did manual testing

@j-esse
Copy link
Author

j-esse commented Jan 24, 2019

@vinooganesh FYSA

@vinooganesh
Copy link

looks like tests are failing - want to fix?

@j-esse
Copy link
Author

j-esse commented Jan 28, 2019

@vinooganesh yeah it's a circle/GH issue:

fatal: Couldn't find remote ref pull/472
Exited with code 128

Any ideas?

@j-esse
Copy link
Author

j-esse commented Jan 28, 2019

@vinooganesh or can you retrigger the build? I don't have permissions

@robert3005
Copy link

builds on forks aren't enabled

@j-esse
Copy link
Author

j-esse commented Jan 28, 2019

Ok, can you either enable builds on forks, or give me write access to palantir/spark so I can make a branch :)

@j-esse j-esse changed the title Maximum repeatedly substituted alias size [DEPRECATED] Maximum repeatedly substituted alias size Jan 29, 2019
@j-esse j-esse closed this Jan 29, 2019
@j-esse
Copy link
Author

j-esse commented Jan 29, 2019

Moved to #475

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants