[SPARK-29937][SQL] Make FileSourceScanExec class fields lazy #26565

ulysses-you · 2019-11-18T01:23:52Z

What changes were proposed in this pull request?

Since JIRA SPARK-28346,PR 25111, QueryExecution will copy all node stage-by-stage. This make all node instance twice almost. So we should make all class fields lazy to avoid create more unexpected object.

Why are the changes needed?

Avoid create more unexpected object.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Exists UT.

ulysses-you · 2019-11-18T01:25:16Z

cc @cloud-fan @viirya @gatorsmile

sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala

dongjoon-hyun · 2019-11-18T01:48:09Z

ok to test

SparkQA · 2019-11-18T05:39:49Z

Test build #113975 has finished for PR 26565 at commit 0aa4a48.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-11-22T12:52:36Z

OK so plan copy is more expensive than we expect. How about we reduce the number of copies? What we really need is to copy the logical plan between analyzer and optimizer, other copies are not really needed, and are just there for consistency.

ulysses-you · 2019-11-23T00:52:19Z

Yeah, it is really expensive, many plan do init action during instance.
I think we just need to copy analyzed plan what is the base plan. And for other case, we should check and copy just like what AdaptiveExecution do before.

val logicalPlan = if (sparkSession.sessionState.conf.adaptiveExecutionEnabled) {
  optimizedPlan.clone() 
} else {
  optimizedPlan
}

dongjoon-hyun

+1, LGTM. Merged to master.
Thank you, @ulysses-you , @cloud-fan , @viirya .

ulysses-you · 2019-11-25T00:50:28Z

Thanks for merging!

ulysses-you added 3 commits November 18, 2019 09:03

init

3904d16

add private

7aa9d3d

fix style

0aa4a48

ulysses-you commented Nov 18, 2019

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala Show resolved Hide resolved

viirya reviewed Nov 18, 2019

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala Show resolved Hide resolved

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala Show resolved Hide resolved

dongjoon-hyun added the SQL label Nov 18, 2019

dongjoon-hyun approved these changes Nov 25, 2019

View reviewed changes

dongjoon-hyun closed this in a8d907c Nov 25, 2019

ulysses-you deleted the make-val-lazy branch September 17, 2021 11:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-29937][SQL] Make FileSourceScanExec class fields lazy #26565

[SPARK-29937][SQL] Make FileSourceScanExec class fields lazy #26565

Uh oh!

ulysses-you commented Nov 18, 2019

Uh oh!

ulysses-you commented Nov 18, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dongjoon-hyun commented Nov 18, 2019

Uh oh!

SparkQA commented Nov 18, 2019

Uh oh!

cloud-fan commented Nov 22, 2019

Uh oh!

ulysses-you commented Nov 23, 2019

Uh oh!

dongjoon-hyun left a comment

Uh oh!

ulysses-you commented Nov 25, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SPARK-29937][SQL] Make FileSourceScanExec class fields lazy #26565

[SPARK-29937][SQL] Make FileSourceScanExec class fields lazy #26565

Uh oh!

Conversation

ulysses-you commented Nov 18, 2019

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

ulysses-you commented Nov 18, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dongjoon-hyun commented Nov 18, 2019

Uh oh!

SparkQA commented Nov 18, 2019

Uh oh!

cloud-fan commented Nov 22, 2019

Uh oh!

ulysses-you commented Nov 23, 2019

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

ulysses-you commented Nov 25, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants