-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-15646] [SQL] When spark.sql.hive.convertCTAS is true, the conversion rule needs to respect TEXTFILE/SEQUENCEFILE format and the user-defined location #13386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@ericl @andrewor14 @liancheng Can you review this PR? |
|
Test build #59570 has finished for PR 13386 at commit
|
|
Test build #59579 has finished for PR 13386 at commit
|
|
test this please |
|
Test build #59581 has finished for PR 13386 at commit
|
|
OK. External related changes will be handled by #13395. |
|
Test build #59594 has finished for PR 13386 at commit
|
|
test this please |
|
Test build #59660 has finished for PR 13386 at commit
|
| q | ||
| ) | ||
| } else { | ||
| CreateTableAsSelectLogicalPlan(tableDesc, q, ifNotExists) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this one also be renamed to CreateHiveTableAsSelectLogicalPlan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at its implementation, it is not hive-specific. So, seems fine to leave it as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like HiveMetastoreCatalog is the only user though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. Let me change it
|
looks good, just have a couple questions |
| val hasStorageProperties = (ctx.createFileFormat != null) || (ctx.rowFormat != null) | ||
| if (conf.convertCTAS && !hasStorageProperties) { | ||
| val mode = if (ifNotExists) SaveMode.Ignore else SaveMode.ErrorIfExists | ||
| val options = rowStorage.serdeProperties ++ fileStorage.serdeProperties |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these will always be empty if we've reached here, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea
|
LGTM, minor comments only. |
|
Test build #59720 has finished for PR 13386 at commit
|
|
Test build #59748 has finished for PR 13386 at commit
|
|
Merging into master 2.0 |
…rsion rule needs to respect TEXTFILE/SEQUENCEFILE format and the user-defined location ## What changes were proposed in this pull request? When `spark.sql.hive.convertCTAS` is true, for a CTAS statement, we will create a data source table using the default source (i.e. parquet) if the CTAS does not specify any Hive storage format. However, there are two issues with this conversion logic. 1. First, we determine if a CTAS statement defines storage format by checking the serde. However, TEXTFILE/SEQUENCEFILE does not have a default serde. When we do the check, we have not set the default serde. So, a query like `CREATE TABLE abc STORED AS TEXTFILE AS SELECT ...` actually creates a data source parquet table. 2. In the conversion logic, we are ignoring the user-specified location. This PR fixes the above two issues. Also, this PR makes the parser throws an exception when a CTAS statement has a PARTITIONED BY clause. This change is made because Hive's syntax does not allow it and our current implementation actually does not work for this case (the insert operation always throws an exception because the insertion does not pick up the partitioning info). ## How was this patch tested? I am adding new tests in SQLQuerySuite and HiveDDLCommandSuite. Author: Yin Huai <yhuai@databricks.com> Closes #13386 from yhuai/SPARK-14507. (cherry picked from commit 6dddb70) Signed-off-by: Andrew Or <andrew@databricks.com>
|
Just realized this PR introduced the original changes. Could you also review my PR: #13907? When users create table as query with |
What changes were proposed in this pull request?
When
spark.sql.hive.convertCTASis true, for a CTAS statement, we will create a data source table using the default source (i.e. parquet) if the CTAS does not specify any Hive storage format. However, there are two issues with this conversion logic.CREATE TABLE abc STORED AS TEXTFILE AS SELECT ...actually creates a data source parquet table.This PR fixes the above two issues.
Also, this PR makes the parser throws an exception when a CTAS statement has a PARTITIONED BY clause. This change is made because Hive's syntax does not allow it and our current implementation actually does not work for this case (the insert operation always throws an exception because the insertion does not pick up the partitioning info).
How was this patch tested?
I am adding new tests in SQLQuerySuite and HiveDDLCommandSuite.