You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the solution you'd like
We already have a per file extension listing option implementation for the read_ dataframe APIs (e.g. CsvReadOptions, ParquetReadOptions) and they have sane defaults (like collect_stats is false for CSV and true for Parquet). I wonder whether we can just use them here and obtain the ListingOptions directly from them.
Describe alternatives you've considered
Leaving as is, or enabling them globally (instead of refactoring that part to use ReadOptions) by just setting the flag to true.
The text was updated successfully, but these errors were encountered:
I wonder if it ever can be enabled by default for parquet datasets.
The downside for parquet is that when using remote object storage, collecting of statistics takes quite a bit of IO, slowing down simple queries.
I guess at some point we have to switch testing with Delta Lake or Apache Iceberg I guess :)
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
#1347 enabled collection of statistics by default on the
ListingOptions
constructor, though the tables created withCREATE EXTERNAL TABLE
can't still use this feature since they are created manually.https://github.com/apache/arrow-datafusion/blob/e54110fb592e03704da5f6ebd832b8fe1c51123b/datafusion/core/src/execution/context.rs#L486-L488
Describe the solution you'd like
We already have a per file extension listing option implementation for the
read_
dataframe APIs (e.g.CsvReadOptions
,ParquetReadOptions
) and they have sane defaults (likecollect_stats
isfalse
for CSV andtrue
for Parquet). I wonder whether we can just use them here and obtain theListingOptions
directly from them.Describe alternatives you've considered
Leaving as is, or enabling them globally (instead of refactoring that part to use
ReadOptions
) by just setting the flag to true.The text was updated successfully, but these errors were encountered: