[SPARK-29899][SQL] Recursively load data in Hive table via TBLPROPERTIES #26525

LantaoJin · 2019-11-14T15:51:10Z

What changes were proposed in this pull request?

SPARK-27990 (#24830) provide a way to recursively load data from datasource. In SQL, when query a hive table, this property passed by the relation.tableMeta.properties. But it is filtered out now. So we can not lookup file recursively for a Hive table.

In this PR, I don't add a new property or feature. The property recursiveFileLookup in TBLPROPERTIES should work in current implementation. But it's filtered out bugly.

CREATE TABLE test1 (id bigint)
STORED AS PARQUET LOCATION '$baseDir'
TBLPROPERTIES (
'recursiveFileLookup'='true')

Does this PR introduce any user-facing change?

No

How was this patch tested?

Add an UT

SparkQA · 2019-11-14T18:07:40Z

Test build #113796 has finished for PR 26525 at commit 217815e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

nchammas · 2019-11-18T03:28:16Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala

+               |CREATE TABLE test1 (id bigint)
+               |STORED AS PARQUET LOCATION '$baseDir'
+               |TBLPROPERTIES (
+               | 'recursiveFileLookup'='true')


Sorry to ask tangential questions, but I'm curious: Will the Metastore track this property somehow? i.e. If I create a table with 'recursiveFileLookup'='true' using Spark, can I query it from Presto and see the same data, provided that both are pointed at the same Metastore? Will the Metastore just track the table property, or will it also track the list of data paths that were detected when the table was created or refreshed?

Thanks to point me this. Maybe 'spark.recursiveFileLookup' is much more meaningful for user.

cloud-fan · 2019-11-18T08:44:36Z

can you describe the expected behavior? To me, the hive metastore already tells the directory structure: if it's partitioned, then data files are under each partition directory. Otherwise, data files are under table directory. Why do we need to lookup files recursively?

LantaoJin · 2019-11-19T04:36:09Z

@cloud-fan The reason is very simple but I am not sure it's correct for Hive:
We found some data source paths of hive table are nested. And I found a way to handle this in Spark datasource (#24830). Seems datasource API has some reasons to load data recursively. So I think table might have the same approach. I can close this since the issue can be fixed by removing nested paths if this patch looks unreasonable.

cloud-fan · 2019-11-19T11:10:05Z

load files recursively may make sense to some data sources but not tables. We have a clear policy about the files layout for tables. Please close this.

LantaoJin · 2019-11-19T11:46:36Z

@cloud-fan thanks for pointing this. close

[SPARK-29899][SQL] Recursively load data in Hive table via TBLPROPERTIES

217815e

dongjoon-hyun added the SQL label Nov 15, 2019

LantaoJin mentioned this pull request Nov 15, 2019

[SPARK-29869][SQL] improve error message in HiveMetastoreCatalog#convertToLogicalRelation #26499

Closed

nchammas reviewed Nov 18, 2019

View reviewed changes

LantaoJin closed this Nov 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-29899][SQL] Recursively load data in Hive table via TBLPROPERTIES #26525

[SPARK-29899][SQL] Recursively load data in Hive table via TBLPROPERTIES #26525

Uh oh!

LantaoJin commented Nov 14, 2019 •

edited

Loading

Uh oh!

SparkQA commented Nov 14, 2019

Uh oh!

nchammas Nov 18, 2019

Uh oh!

LantaoJin Nov 19, 2019

Uh oh!

cloud-fan commented Nov 18, 2019

Uh oh!

LantaoJin commented Nov 19, 2019

Uh oh!

cloud-fan commented Nov 19, 2019 •

edited

Loading

Uh oh!

LantaoJin commented Nov 19, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SPARK-29899][SQL] Recursively load data in Hive table via TBLPROPERTIES #26525

[SPARK-29899][SQL] Recursively load data in Hive table via TBLPROPERTIES #26525

Uh oh!

Conversation

LantaoJin commented Nov 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented Nov 14, 2019

Uh oh!

nchammas Nov 18, 2019

Choose a reason for hiding this comment

Uh oh!

LantaoJin Nov 19, 2019

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Nov 18, 2019

Uh oh!

LantaoJin commented Nov 19, 2019

Uh oh!

cloud-fan commented Nov 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LantaoJin commented Nov 19, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

LantaoJin commented Nov 14, 2019 •

edited

Loading

cloud-fan commented Nov 19, 2019 •

edited

Loading