-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-29899][SQL] Recursively load data in Hive table via TBLPROPERTIES #26525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #113796 has finished for PR 26525 at commit
|
| |CREATE TABLE test1 (id bigint) | ||
| |STORED AS PARQUET LOCATION '$baseDir' | ||
| |TBLPROPERTIES ( | ||
| | 'recursiveFileLookup'='true') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to ask tangential questions, but I'm curious: Will the Metastore track this property somehow? i.e. If I create a table with 'recursiveFileLookup'='true' using Spark, can I query it from Presto and see the same data, provided that both are pointed at the same Metastore? Will the Metastore just track the table property, or will it also track the list of data paths that were detected when the table was created or refreshed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks to point me this. Maybe 'spark.recursiveFileLookup' is much more meaningful for user.
|
can you describe the expected behavior? To me, the hive metastore already tells the directory structure: if it's partitioned, then data files are under each partition directory. Otherwise, data files are under table directory. Why do we need to lookup files recursively? |
|
@cloud-fan The reason is very simple but I am not sure it's correct for Hive: |
|
load files recursively may make sense to some data sources but not tables. We have a clear policy about the files layout for tables. Please close this. |
|
@cloud-fan thanks for pointing this. close |
What changes were proposed in this pull request?
SPARK-27990 (#24830) provide a way to recursively load data from datasource. In SQL, when query a hive table, this property passed by the
relation.tableMeta.properties. But it is filtered out now. So we can not lookup file recursively for a Hive table.In this PR, I don't add a new property or feature. The property
recursiveFileLookupin TBLPROPERTIES should work in current implementation. But it's filtered out bugly.Does this PR introduce any user-facing change?
No
How was this patch tested?
Add an UT