-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-13656][SQL] Delete spark.sql.parquet.cacheMetadata from SQLConf and docs #19129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I roughly checked other options around parquet and I probably found |
|
Test build #81401 has finished for PR 19129 at commit
|
|
Thank you, @maropu ! |
|
oh, yea. I got you. Thanks! |
|
Thank you for your review and approval, @HyukjinKwon ! |
|
Could you check the change history and find when we removed the usage of this SQLConf? It sounds like we did not have a test case coverage for this in the past. We did not realize it when removing the usage. We also need to update the migration notes. |
|
Sure, I'll. |
|
Wow! Thank you, @HyukjinKwon ! |
|
Please document it in the migration guides. Thanks! |
|
Sure, @gatorsmile . It's SPARK-15639 |
|
It's marked as 2.0.1 and 2.1.0 with the following commit logs. Which section is proper?
I think it's |
|
Or, should I make |
|
|
|
Thank you! |
|
The PR title resolved two issues under title |
docs/sql-programming-guide.md
Outdated
|
|
||
| - From Spark 2.0.1, `spark.sql.parquet.cacheMetadata` is no longer used. See | ||
| [SPARK-16321](https://issues.apache.org/jira/browse/SPARK-16321) and | ||
| [SPARK-15639](https://issues.apache.org/jira/browse/SPARK-15639) for details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two jiras are wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#13701 is [SPARK-15639][SPARK-16321][SQL] Push down filter at RowGroups level for parquet reader.
It's removed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no caller for initializeLocalJobFunc . Thus, initializeLocalJobFunc is a dead code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, then, it's another transitive search.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like https://issues.apache.org/jira/browse/SPARK-13664 is the one that removes the usage of this conf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update like this.
- `spark.sql.parquet.cacheMetadata` is no longer used.
See [SPARK-13664](https://issues.apache.org/jira/browse/SPARK-13664) for details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I'm new to spark. I wonder how to disable metadata caching after deleting this conf. I created an external table, and the parquet files in specified location are updated daily, So I want to disable metadata caching rather than executing 'refresh table xxx'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @zzl1787 . This is Apache Spark 2.3. In Apache Spark 2.3, the metadata cache is not controlled by this parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun Ok, got this, and thank you. Finally I find the parameter to control this.
spark.sql.filesourceTableRelationCacheSize = 0
This will disable the metadata cache.
|
Test build #81523 has finished for PR 19129 at commit
|
|
Test build #81525 has finished for PR 19129 at commit
|
|
Thanks! Merged to master. |
|
Thank you for review, @gatorsmile , @HyukjinKwon , @maropu . |
What changes were proposed in this pull request?
Since SPARK-15639,
spark.sql.parquet.cacheMetadataandPARQUET_CACHE_METADATAis not used. This PR removes from SQLConf and docs.How was this patch tested?
Pass the existing Jenkins.