[SPARK-41262][SQL] Fix `spark.sql.optimizer.canChangeCachedPlanOutputPartitioning` Configuration Default to Match Documentation #47914

DennisJLi · 2024-08-28T16:11:49Z

What changes were proposed in this pull request?

https://spark.apache.org/docs/latest/sql-migration-guide.html#upgrading-from-spark-sql-34-to-35 states that spark.sql.optimizer.canChangeCachedPlanOutputPartitioning is set by default, but that's currently not true, and causes confusion along with lost debugging time.

Notably, when working with cached dataframes in Spark 3.5.0, I saw AQE not working until I manually set the flag myself.

Why are the changes needed?

This aligns the code with the documentation which prevents confusion.

Does this PR introduce any user-facing change?

Yes, but nothing the documentation doesn't already state.

How was this patch tested?

Simple config change.

Was this patch authored or co-authored using generative AI tooling?

NO

…ration Default to Match Documentation https://spark.apache.org/docs/latest/sql-migration-guide.html#upgrading-from-spark-sql-34-to-35 states that `spark.sql.optimizer.canChangeCachedPlanOutputPartitioning` is set by default, but that's currently not true, and causes confusion along with lost debugging time.

dongjoon-hyun

Do you know what causes this mismatch, @DennisJLi ?

BTW, for the mismatches, we usually fix the document instead of the real code because we cannot introduce a breaking behavior change on the released Apache Spark 3.5.x. However, if this is meaningful change, we can change it for Apache Spark 4.0.0 still.

DennisJLi · 2024-08-28T17:20:31Z

Hi @dongjoon-hyun, good question. I figured it got dropped on the floor, but after doing some git blame investigation. It looks like...

Change to set this configuration to true was done in this PR 1569ab5 (committed: 2023-08-22)
- This would have gone out in 3.5.0 with https://github.com/apache/spark/releases/tag/v3.5.0 (released: 2023-09-08)
Change was then undone in this PR https://github.com/apache/spark/commit/ (becc04a, presumably a bad rebase. (committed: 2024-02-07)
- This would have gone out in 3.5.1 with https://github.com/apache/spark/releases/tag/v3.5.1 presuming the dates all lined up (2024-02-15)

Ideally a patch version wouldn't have reverted this and there wasn't a documentation change, so I think if we just fix this value, then we'd go back to the expected state.

What do you think Dongjoon? Also, @ulysses-you , since you did the original change do you have any input here?

DennisJLi · 2024-08-28T17:22:52Z

Sorry my bad, reading that other PR closer, it wanted to disable the value. So let me update the documentation instead. Sorry about the confusion, I rushed ahead on that.

DennisJLi · 2024-08-28T17:33:22Z

#47915 documentation change published as a separate PR.

github-actions bot added the SQL label Aug 28, 2024

DennisJLi changed the title ~~Fix spark.sql.optimizer.canChangeCachedPlanOutputPartitioning Configuration Default to Match Documentation~~ Fix spark.sql.optimizer.canChangeCachedPlanOutputPartitioning Configuration Default to Match Documentation Aug 28, 2024

DennisJLi changed the title ~~Fix spark.sql.optimizer.canChangeCachedPlanOutputPartitioning Configuration Default to Match Documentation~~ [SPARK-41262][SQL] Fix spark.sql.optimizer.canChangeCachedPlanOutputPartitioning Configuration Default to Match Documentation Aug 28, 2024

dongjoon-hyun requested changes Aug 28, 2024

View reviewed changes

DennisJLi changed the base branch from master to branch-3.5 August 28, 2024 17:29

DennisJLi changed the base branch from branch-3.5 to master August 28, 2024 17:29

DennisJLi closed this Aug 28, 2024

DennisJLi deleted the patch-1 branch August 28, 2024 17:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-41262][SQL] Fix `spark.sql.optimizer.canChangeCachedPlanOutputPartitioning` Configuration Default to Match Documentation #47914

[SPARK-41262][SQL] Fix `spark.sql.optimizer.canChangeCachedPlanOutputPartitioning` Configuration Default to Match Documentation #47914

Uh oh!

DennisJLi commented Aug 28, 2024

Uh oh!

dongjoon-hyun left a comment

Uh oh!

DennisJLi commented Aug 28, 2024

Uh oh!

DennisJLi commented Aug 28, 2024

Uh oh!

DennisJLi commented Aug 28, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[SPARK-41262][SQL] Fix spark.sql.optimizer.canChangeCachedPlanOutputPartitioning Configuration Default to Match Documentation #47914

[SPARK-41262][SQL] Fix spark.sql.optimizer.canChangeCachedPlanOutputPartitioning Configuration Default to Match Documentation #47914

Uh oh!

Conversation

DennisJLi commented Aug 28, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

DennisJLi commented Aug 28, 2024

Uh oh!

DennisJLi commented Aug 28, 2024

Uh oh!

DennisJLi commented Aug 28, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[SPARK-41262][SQL] Fix `spark.sql.optimizer.canChangeCachedPlanOutputPartitioning` Configuration Default to Match Documentation #47914

[SPARK-41262][SQL] Fix `spark.sql.optimizer.canChangeCachedPlanOutputPartitioning` Configuration Default to Match Documentation #47914