-
Notifications
You must be signed in to change notification settings - Fork 1.7k
fix: Remove datafusion.execution.parquet.cache_metadata
config
#17062
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
// Without filter will not read pageIndex. | ||
assert!(bytes_scanned_with_filter > bytes_scanned_without_filter); | ||
// Same amount of bytes are scanned when defaulting to cache parquet metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nuno-faria this change seems to be brought by CachedParquetFileReaderFactory
you added. I just changed the test to reflect the correct behaviour after using the factory as the default now. Just making sure if it looks correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, when caching by default the page index is always read, even if the query does not require it. FYI @alamb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is best practice anyways and probably what people want
if let Some(metadata_cache) = | ||
state.runtime_env().cache_manager.get_file_metadata_cache() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After #17031 is merged the metadata_cache
won't be an Option
anymore, so this can be simplified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update after #17031 gets merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update after #17031 gets merged
I just merged it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be good to go!
Thanks @jonathanc-n, LGTM. Now we are able to get the performance of metadata caching right out of the box. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @nuno-faria and @jonathanc-n
For anyone else reviewing, this is not an API change because the config parameter that was removed was added in #16971, which we haven't released yet
Which issue does this PR close?
datafusion.execution.parquet.cache_metadata
config #17047 .Rationale for this change
Removes cache_metadata config and opt to always cache Parquet metadata
What changes are included in this PR?
Removal of all
cache_metadata
uses