-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Is your feature request related to a problem or challenge?
Now that we have a limited parquet metadata cache for the built in ListingTableProvider thanks to @nuno-faria ❤️ in #17031
There are now two configuration options that control the caching behavior
set datafusion.execution.parquet.cache_metadata = true;
And
set datafusion.runtime.file_metadata_cache_limit = 100M
Now that we have a cache limit, I think we should consider "always" trying to cache the parquet metadata
Describe the solution you'd like
I suggest we remove options.cache_metadata
and always try to save the metadata (which will be a noop if the cache is too smal)
As @nuno-faria says on #17031 (comment)
I think caching by default would be good. The only situation where it wouldn't help would be one-time scans of parquet files that do not require the page index, but for large files the scan should largely outweigh the page index retrieval anyway.
And especially if we limit memory used to 50 or 100MB that people can disable by turning off the cache, I think that would be the best "out of the box" experience for the most users
Describe alternatives you've considered
No response
Additional context
No response