-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-29937][SQL] Make FileSourceScanExec class fields lazy #26565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
Show resolved
Hide resolved
|
ok to test |
|
Test build #113975 has finished for PR 26565 at commit
|
|
OK so plan copy is more expensive than we expect. How about we reduce the number of copies? What we really need is to copy the logical plan between analyzer and optimizer, other copies are not really needed, and are just there for consistency. |
|
Yeah, it is really expensive, many plan do init action during instance. |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Merged to master.
Thank you, @ulysses-you , @cloud-fan , @viirya .
|
Thanks for merging! |
What changes were proposed in this pull request?
Since JIRA SPARK-28346,PR 25111, QueryExecution will copy all node stage-by-stage. This make all node instance twice almost. So we should make all class fields lazy to avoid create more unexpected object.
Why are the changes needed?
Avoid create more unexpected object.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Exists UT.