-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
select count(*) from '*.parquet'
scans all parquet file recursively including subdirectory
#8524
Comments
I think the reason this happens is that the
As I understand it, DataFusion is trying to model the behavior of "Hive PartitionedTables" -- so to answer this question I think we need to research what Hive does in this case |
Makes sense to me. THank you for the research @zhangxffff If we wish to change this behavior, perhaps we can add a configuration parameter to have the old or new behavior (defaulting to the new behavior) |
Describe the bug
I find
select count(*) from '*.parquet'
not only scan the parquet file in current directory, but it also recursively scan all the parquet file in subdirectory. I wonder is this behavior by design or a bug.To Reproduce
I tried with three parquet file, and two of them are in subdir.
users.parquet
has 2 record,file1.parquet
has 1 record andfile2.parquet
has 1 record.select count(*) from '*.parquet'
get 4Expected behavior
I try same query in duckdb, and duckdb only scan parquet file in current directory
Additional context
No response
The text was updated successfully, but these errors were encountered: