[SPARK-15646] [SQL] When spark.sql.hive.convertCTAS is true, the conversion rule needs to respect TEXTFILE/SEQUENCEFILE format and the user-defined location #13386

yhuai · 2016-05-28T23:43:17Z

What changes were proposed in this pull request?

When spark.sql.hive.convertCTAS is true, for a CTAS statement, we will create a data source table using the default source (i.e. parquet) if the CTAS does not specify any Hive storage format. However, there are two issues with this conversion logic.

First, we determine if a CTAS statement defines storage format by checking the serde. However, TEXTFILE/SEQUENCEFILE does not have a default serde. When we do the check, we have not set the default serde. So, a query like CREATE TABLE abc STORED AS TEXTFILE AS SELECT ... actually creates a data source parquet table.
In the conversion logic, we are ignoring the user-specified location.

This PR fixes the above two issues.

Also, this PR makes the parser throws an exception when a CTAS statement has a PARTITIONED BY clause. This change is made because Hive's syntax does not allow it and our current implementation actually does not work for this case (the insert operation always throws an exception because the insertion does not pick up the partitioning info).

How was this patch tested?

I am adding new tests in SQLQuerySuite and HiveDDLCommandSuite.

yhuai · 2016-05-28T23:43:46Z

@ericl @andrewor14 @liancheng Can you review this PR?

SparkQA · 2016-05-29T01:16:23Z

Test build #59570 has finished for PR 13386 at commit 2615f67.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

…ED BY clause.

SparkQA · 2016-05-29T04:54:19Z

Test build #59579 has finished for PR 13386 at commit c5cb32c.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2016-05-29T05:07:55Z

test this please

SparkQA · 2016-05-29T06:46:57Z

Test build #59581 has finished for PR 13386 at commit c5cb32c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2016-05-29T18:25:10Z

OK. External related changes will be handled by #13395.

SparkQA · 2016-05-29T19:51:54Z

Test build #59594 has finished for PR 13386 at commit fa89081.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2016-05-31T16:43:48Z

test this please

SparkQA · 2016-05-31T18:23:27Z

Test build #59660 has finished for PR 13386 at commit fa89081.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

ericl · 2016-05-31T21:18:17Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

+            q
+          )
+        } else {
+          CreateTableAsSelectLogicalPlan(tableDesc, q, ifNotExists)


Should this one also be renamed to CreateHiveTableAsSelectLogicalPlan?

Looking at its implementation, it is not hive-specific. So, seems fine to leave it as is.

Seems like HiveMetastoreCatalog is the only user though.

ok. Let me change it

ericl · 2016-05-31T21:23:44Z

looks good, just have a couple questions

andrewor14 · 2016-05-31T21:26:23Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

+        val hasStorageProperties = (ctx.createFileFormat != null) || (ctx.rowFormat != null)
+        if (conf.convertCTAS && !hasStorageProperties) {
+          val mode = if (ifNotExists) SaveMode.Ignore else SaveMode.ErrorIfExists
+          val options = rowStorage.serdeProperties ++ fileStorage.serdeProperties


I think these will always be empty if we've reached here, no?

andrewor14 · 2016-05-31T21:28:44Z

LGTM, minor comments only.

SparkQA · 2016-06-01T07:48:16Z

Test build #59720 has finished for PR 13386 at commit b137cba.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds the following public classes (experimental):
- case class CreateHiveTableAsSelectLogicalPlan(

SparkQA · 2016-06-01T18:32:22Z

Test build #59748 has finished for PR 13386 at commit 88e7422.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2016-06-02T00:55:20Z

Merging into master 2.0

…rsion rule needs to respect TEXTFILE/SEQUENCEFILE format and the user-defined location ## What changes were proposed in this pull request? When `spark.sql.hive.convertCTAS` is true, for a CTAS statement, we will create a data source table using the default source (i.e. parquet) if the CTAS does not specify any Hive storage format. However, there are two issues with this conversion logic. 1. First, we determine if a CTAS statement defines storage format by checking the serde. However, TEXTFILE/SEQUENCEFILE does not have a default serde. When we do the check, we have not set the default serde. So, a query like `CREATE TABLE abc STORED AS TEXTFILE AS SELECT ...` actually creates a data source parquet table. 2. In the conversion logic, we are ignoring the user-specified location. This PR fixes the above two issues. Also, this PR makes the parser throws an exception when a CTAS statement has a PARTITIONED BY clause. This change is made because Hive's syntax does not allow it and our current implementation actually does not work for this case (the insert operation always throws an exception because the insertion does not pick up the partitioning info). ## How was this patch tested? I am adding new tests in SQLQuerySuite and HiveDDLCommandSuite. Author: Yin Huai <yhuai@databricks.com> Closes #13386 from yhuai/SPARK-14507. (cherry picked from commit 6dddb70) Signed-off-by: Andrew Or <andrew@databricks.com>

gatorsmile · 2016-06-26T04:59:26Z

Just realized this PR introduced the original changes. Could you also review my PR: #13907?

When users create table as query with STORED AS or ROW FORMAT and spark.sql.hive.convertCTAS is set to true, we do not convert them to data source tables. I am wondering whether we still can convert the tables specified in parquet and orc formats to data source tables? Thanks!

yhuai added 3 commits May 28, 2016 16:19

test cases

f613d9e

Move the conversion logic to the parser.

1e22d53

Update tests

2615f67

Hive style CTAS command does not allow EXTERNAL keyword and PARTITION…

c5cb32c

…ED BY clause.

yhuai added 2 commits May 29, 2016 10:47

wip

220a6e0

Changes related to EXTERNAL will be handled in another PR.

fa89081

yhuai mentioned this pull request May 29, 2016

[SPARK-14507] [SQL] EXTERNAL keyword in a CTAS statement is not allowed #13395

Closed

ericl reviewed May 31, 2016
View reviewed changes

andrewor14 reviewed May 31, 2016
View reviewed changes

address comments

b137cba

yhuai added 2 commits June 1, 2016 09:31

Merge remote-tracking branch 'upstream/master' into SPARK-14507

1991988

Fix tests

88e7422

asfgit closed this in 6dddb70 Jun 2, 2016

gatorsmile mentioned this pull request Sep 22, 2016

[SPARK-17620][SQL] Determine Serde by hive.default.fileformat when Creating Hive Serde Tables #15190

Closed

[SPARK-15646] [SQL] When spark.sql.hive.convertCTAS is true, the conversion rule needs to respect TEXTFILE/SEQUENCEFILE format and the user-defined location #13386

[SPARK-15646] [SQL] When spark.sql.hive.convertCTAS is true, the conversion rule needs to respect TEXTFILE/SEQUENCEFILE format and the user-defined location #13386

Uh oh!

Conversation

yhuai commented May 28, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

yhuai commented May 28, 2016

Uh oh!

SparkQA commented May 29, 2016

Uh oh!

SparkQA commented May 29, 2016

Uh oh!

yhuai commented May 29, 2016

Uh oh!

SparkQA commented May 29, 2016

Uh oh!

yhuai commented May 29, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented May 29, 2016

Uh oh!

yhuai commented May 31, 2016

Uh oh!

SparkQA commented May 31, 2016

Uh oh!

ericl May 31, 2016

Choose a reason for hiding this comment

Uh oh!

yhuai May 31, 2016

Choose a reason for hiding this comment

Uh oh!

ericl May 31, 2016

Choose a reason for hiding this comment

Uh oh!

yhuai Jun 1, 2016

Choose a reason for hiding this comment

Uh oh!

ericl commented May 31, 2016

Uh oh!

andrewor14 May 31, 2016

Choose a reason for hiding this comment

Uh oh!

yhuai Jun 1, 2016

Choose a reason for hiding this comment

Uh oh!

andrewor14 commented May 31, 2016

Uh oh!

SparkQA commented Jun 1, 2016

Uh oh!

SparkQA commented Jun 1, 2016

Uh oh!

andrewor14 commented Jun 2, 2016

Uh oh!

gatorsmile commented Jun 26, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

yhuai commented May 28, 2016 •

edited

Loading

yhuai commented May 29, 2016 •

edited

Loading