Consider resolving a clickbench files as Utf8
(rather than binary)
#12510
Labels
enhancement
New feature or request
Utf8
(rather than binary)
#12510
Is your feature request related to a problem or challenge?
In the ClickBench benchmark queries, there are two datasets we use. A "single file"
hits.parquet
and "partitioned" which has 100 files in a directory. They hold the same data.However DataFusion resolves
hits.parquet
such that columns likeURL
are aUtf8
orUtf8View
while the same columns are resolved asBinary
orBinaryView
This has caused some small slowdowns while enabling StringView by default -- see #12509
You can see the schema resolution by:
Then run
datafusion-cli
:It semes for some reason the individual files are all resolved to
Binary
:Describe the solution you'd like
I would like ideally that the clickbench queries resolve to the same schema, in this case Utf8 given the contents of the files and the queries that treat it them as strings
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: