[SPARK-16209] [SQL] Convert Hive Tables in PARQUET/ORC to Data Source Tables for CREATE TABLE AS SELECT #13907

gatorsmile · 2016-06-25T23:11:28Z

What changes were proposed in this pull request?

Currently, the following created tables will be Hive Table.

CREATE TABLE t STORED AS parquet SELECT 1 as a, 1 as b

CREATE TABLE t1
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
SELECT col1, col2 from t3

When users create table as query with STORED AS or ROW FORMAT and spark.sql.hive.convertCTAS is set to true, we will not convert them to data source tables. Actually, for parquet and orc formats, we still can convert them to data source table even if the users use STORED AS or ROW FORMAT.

How was this patch tested?

Added test cases for both ORC and PARQUET

SparkQA · 2016-06-26T00:23:20Z

Test build #61243 has finished for PR 13907 at commit c4bde02.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-06-26T06:16:11Z

Test build #61253 has finished for PR 13907 at commit a9ce0d8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2016-06-27T01:31:01Z

With your PR, if users specify ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde', will we convert?

gatorsmile · 2016-06-27T04:27:21Z

Nope. If users do not specify the intput and output formats. We will use the default INPUTFORMAT, which is org.apache.hadoop.mapred.TextInputFormat and the default OUTPUTFORMAT, which is org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat. This is different from the standard input and output formats for ORC: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat and org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.

I am not sure whether we should still convert it. Please let me know if you think we should still convert them. Thanks!

BTW, I also confirmed Spark SQL and Hive have the same default input and output formats.

gatorsmile · 2016-08-04T19:41:43Z

cc @cloud-fan This is not contained in #14482. Should I leave it open? Or should I fix the conflict after #14482 is merged?

cloud-fan · 2016-08-09T03:16:08Z

I don't think it's a very useful feature, and we may surprise users as they do use hive syntax to specify row format.

For advanced users, they can easily use USING xxx to explicitly create a data source table for better performance.

gatorsmile · 2016-08-09T04:43:16Z

I see. Let me close it.

gatorsmile added 3 commits June 25, 2016 16:05

fix

06e115c

Merge remote-tracking branch 'upstream/master' into storedAsParquet

2cf107d

clean

c4bde02

gatorsmile changed the title ~~[SPARK-16209] [SQL] Convert Hive Tables to Data Source Tables for CREATE TABLE AS SELECT~~ [SPARK-16209] [SQL] Convert Create Hive Tables As Select in Parquet/Orc to Data Source Tables for CREATE TABLE AS SELECT Jun 26, 2016

gatorsmile changed the title ~~[SPARK-16209] [SQL] Convert Create Hive Tables As Select in Parquet/Orc to Data Source Tables for CREATE TABLE AS SELECT~~ [SPARK-16209] [SQL] Convert Hive Tables in PARQUET/ORC to Data Source Tables for CREATE TABLE AS SELECT Jun 26, 2016

test case fix

a9ce0d8

gatorsmile closed this Aug 9, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-16209] [SQL] Convert Hive Tables in PARQUET/ORC to Data Source Tables for CREATE TABLE AS SELECT #13907

[SPARK-16209] [SQL] Convert Hive Tables in PARQUET/ORC to Data Source Tables for CREATE TABLE AS SELECT #13907

Uh oh!

gatorsmile commented Jun 25, 2016 •

edited

Loading

Uh oh!

SparkQA commented Jun 26, 2016

Uh oh!

SparkQA commented Jun 26, 2016

Uh oh!

yhuai commented Jun 27, 2016

Uh oh!

gatorsmile commented Jun 27, 2016 •

edited

Loading

Uh oh!

gatorsmile commented Aug 4, 2016 •

edited

Loading

Uh oh!

cloud-fan commented Aug 9, 2016

Uh oh!

gatorsmile commented Aug 9, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-16209] [SQL] Convert Hive Tables in PARQUET/ORC to Data Source Tables for CREATE TABLE AS SELECT #13907

[SPARK-16209] [SQL] Convert Hive Tables in PARQUET/ORC to Data Source Tables for CREATE TABLE AS SELECT #13907

Uh oh!

Conversation

gatorsmile commented Jun 25, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Jun 26, 2016

Uh oh!

SparkQA commented Jun 26, 2016

Uh oh!

yhuai commented Jun 27, 2016

Uh oh!

gatorsmile commented Jun 27, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gatorsmile commented Aug 4, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloud-fan commented Aug 9, 2016

Uh oh!

gatorsmile commented Aug 9, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gatorsmile commented Jun 25, 2016 •

edited

Loading

gatorsmile commented Jun 27, 2016 •

edited

Loading

gatorsmile commented Aug 4, 2016 •

edited

Loading