-
Notifications
You must be signed in to change notification settings - Fork 28.5k
[SPARK-51747][SQL] Data source cached plan should respect options #50538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
Outdated
Show resolved
Hide resolved
szehon-ho
reviewed
Apr 9, 2025
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
Outdated
Show resolved
Hide resolved
szehon-ho
reviewed
Apr 9, 2025
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
Outdated
Show resolved
Hide resolved
gengliangwang
approved these changes
Apr 9, 2025
szehon-ho
approved these changes
Apr 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @asl3 !
Thanks, merging to master/4.0 |
gengliangwang
added a commit
that referenced
this pull request
Apr 10, 2025
### What changes were proposed in this pull request? Data source cached plan should respect options, such as CSV delimiter. Before this, DataSourceStrategy caches the first plan and reuses it in the future, ignoring updated options. This change returns a **new plan** if options are changed. ### Why are the changes needed? For example: ``` spark.sql("CREATE TABLE t(a string, b string) USING CSV".stripMargin) spark.sql("INSERT INTO TABLE t VALUES ('a;b', 'c')") spark.sql("SELECT * FROM t").show() spark.sql("SELECT * FROM t WITH ('delimiter' = ';')") ``` Expected output: ``` +----+----+ |col1|col2| +----+----+ | a;b| c| +----+----+ +----+----+ |col1|col2| +----+----+ | a| b,c| +----+----+ ``` Output before this PR: ``` +----+----+ |col1|col2| +----+----+ | a;b| c| +----+----+ +----+----+ |col1|col2| +----+----+ | a;b| c| +----+----+ ``` The PR is needed to get the expected result. ### Does this PR introduce _any_ user-facing change? Yes, corrects the caching behavior from DataSourceStrategy ### How was this patch tested? Added test in DDLSuite.scala ### Was this patch authored or co-authored using generative AI tooling? No Closes #50538 from asl3/asl3/datasourcestrategycacheoptions. Lead-authored-by: Amanda Liu <amanda.liu@databricks.com> Co-authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Gengliang Wang <gengliang@apache.org> (cherry picked from commit d2a864f) Signed-off-by: Gengliang Wang <gengliang@apache.org>
gengliangwang
pushed a commit
that referenced
this pull request
Apr 15, 2025
…ion guide ### What changes were proposed in this pull request? Follow-up to #50538. Add a SQL legacy conf to enable/disable the change to allow users to restore the previous behavior. Also add a migration guide note. ### Why are the changes needed? The original PR changes the behavior of reading from a data source file with options. The flag is needed to allow users a way to restore the former behavior, if desired. ### Does this PR introduce _any_ user-facing change? No (original PR was a user-facing change, but this PR simply adds a config). ### How was this patch tested? Added test for the config ### Was this patch authored or co-authored using generative AI tooling? No Closes #50571 from asl3/asl3/filedatasourcecache-docsconf. Authored-by: Amanda Liu <amanda.liu@databricks.com> Signed-off-by: Gengliang Wang <gengliang@apache.org>
gengliangwang
pushed a commit
that referenced
this pull request
Apr 15, 2025
…ion guide ### What changes were proposed in this pull request? Follow-up to #50538. Add a SQL legacy conf to enable/disable the change to allow users to restore the previous behavior. Also add a migration guide note. ### Why are the changes needed? The original PR changes the behavior of reading from a data source file with options. The flag is needed to allow users a way to restore the former behavior, if desired. ### Does this PR introduce _any_ user-facing change? No (original PR was a user-facing change, but this PR simply adds a config). ### How was this patch tested? Added test for the config ### Was this patch authored or co-authored using generative AI tooling? No Closes #50571 from asl3/asl3/filedatasourcecache-docsconf. Authored-by: Amanda Liu <amanda.liu@databricks.com> Signed-off-by: Gengliang Wang <gengliang@apache.org> (cherry picked from commit 3998186) Signed-off-by: Gengliang Wang <gengliang@apache.org>
vladimirg-db
pushed a commit
to vladimirg-db/spark
that referenced
this pull request
Apr 15, 2025
…ion guide ### What changes were proposed in this pull request? Follow-up to apache#50538. Add a SQL legacy conf to enable/disable the change to allow users to restore the previous behavior. Also add a migration guide note. ### Why are the changes needed? The original PR changes the behavior of reading from a data source file with options. The flag is needed to allow users a way to restore the former behavior, if desired. ### Does this PR introduce _any_ user-facing change? No (original PR was a user-facing change, but this PR simply adds a config). ### How was this patch tested? Added test for the config ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#50571 from asl3/asl3/filedatasourcecache-docsconf. Authored-by: Amanda Liu <amanda.liu@databricks.com> Signed-off-by: Gengliang Wang <gengliang@apache.org>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Data source cached plan should respect options, such as CSV delimiter. Before this, DataSourceStrategy caches the first plan and reuses it in the future, ignoring updated options. This change returns a new plan if options are changed.
Why are the changes needed?
For example:
Expected output:
Output before this PR:
The PR is needed to get the expected result.
Does this PR introduce any user-facing change?
Yes, corrects the caching behavior from DataSourceStrategy
How was this patch tested?
Added test in DDLSuite.scala
Was this patch authored or co-authored using generative AI tooling?
No