-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-46995][DOCS][FOLLOWUP] Update sql-migration-guide.md documentation
#47915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
spark.sql.optimizer.canChangeCachedPlanOutputPartitioning in 3.5.1spark.sql.optimizer.canChangeCachedPlanOutputPartitioning in 3.5.1
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of now, it's true in branch-3.5. Do you mean it's enabled at 3.5.0 and disabled at 3.5.1 and re-enabled at 3.5.2?
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Lines 1535 to 1545 in dcfefd0
| val CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING = | |
| buildConf("spark.sql.optimizer.canChangeCachedPlanOutputPartitioning") | |
| .internal() | |
| .doc("Whether to forcibly enable some optimization rules that can change the output " + | |
| "partitioning of a cached query when executing it for caching. If it is set to true, " + | |
| "queries may need an extra shuffle to read the cached data. This configuration is " + | |
| "enabled by default. The optimization rules enabled by this configuration " + | |
| s"are ${ADAPTIVE_EXECUTION_ENABLED.key} and ${AUTO_BUCKETED_SCAN_ENABLED.key}.") | |
| .version("3.2.0") | |
| .booleanConf | |
| .createWithDefault(true) |
|
Thanks for the high quality bar Dongjoon, you're right, all of the commits on After further investigation, the issue I was experiencing seems to be due to JARs published by AWS on EMR differing from that in the Spark source. Running on EMR 7.2.0, AWS states that they vend Spark 3.5.1. So I pulled down I'm not sure why they changed it, but it's my bad for not checking to make sure the AWS vended JAR was the same first; I got mislead by the value on the I will change in PR to instead to include a note in the 3.5 to 4.0.0 documentation that this flag has been redisabled since I don't see that present yet. |
d1001fb to
8a7797d
Compare
|
Commit has been updated to include 4.0.0 documentation. |
spark.sql.optimizer.canChangeCachedPlanOutputPartitioning in 3.5.1spark.sql.optimizer.canChangeCachedPlanOutputPartitioning in 4.0.0
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @liuzqt , @cloud-fan , @yaooqinn from #45054
spark.sql.optimizer.canChangeCachedPlanOutputPartitioning in 4.0.0sql-migration-guide.md documentation
|
Also, documenting whether it is true or false does not capture the underlying changes made by #45054. Those changes are unlikely to be noticed by users. |
|
Oh, right. I missed that this is an internal conf. In that case, ya, we can ignore this. |
What changes were proposed in this pull request?
Fixing the documentation.
Why are the changes needed?
Migration guide for 3.5.0 said a default was enabled, but upcoming changes for 4.0.0 will disable it but there are no documentation updates indicating this.
Does this PR introduce any user-facing change?
Yes, this fixes the documentation to align with actual Spark behavior introduced in becc04a.
How was this patch tested?
Documentation only change.
Was this patch authored or co-authored using generative AI tooling?
NO