-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-14592][SQL] Native support for CREATE TABLE LIKE DDL command #12362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #55722 has finished for PR 12362 at commit
|
| rowFormat? createFileFormat? locationSpec? | ||
| (TBLPROPERTIES tablePropertyList)? | ||
| (AS? query)? #createTable | ||
| | createTableHeader LIKE source=tableIdentifier #createTableLike |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why you do not implement the full Hive spec? It seems usefull to be able to change the non-structural properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, what does it mean to provide table properties or row format in a CREATE TABLE LIKE statement? Trying to merge the provided properties with the existing ones seems complicated and full of corner cases. It might be simpler to just handle CREATE TABLE x LIKE y instead of accepting more things that we don't know how to handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. Hive only documents the simple case of CREATE TABLE x LIKE y without any other flags or properties
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableLike
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(For that reason I don't think we should use createTableHeader here since it doesn't make sense to say CREATE TEMPORARY TABLE x LIKE ..., and we just throw an exception downstream anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, just being curious here (not saying that we should support this). The grammar (of course) was much more elaborate in its definition: https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g#L887-L892
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, I am referring the Hive manual that only lists one syntax CREATE TABLE x LIKE y.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we want to support CREATE TABLE IF NOT EXIST x LIKE y? It is not on the Hive manual, but HiveQuerySuite has tests using this syntax...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just found that in HiveCompatibilitySuite, one test create_like_tbl_props uses the syntax:
CREATE TABLE test_table LIKE src TBLPROPERTIES('key'='value')
So do we want to support it? Or just disable this test? @andrewor14 @hvanhovell
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say disable it. I don't think it actually merges the table properties (and it would be super confusing if it did)
|
@viirya we just merged a patch that implements |
Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveSqlParser.scala sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveDDLCommandSuite.scala
|
@andrewor14 @hvanhovell Thanks. Your comments are addressed. One remaining question is do we want to support the syntax like: Because in |
|
I would say disable it. I don't think it actually merges the table properties (and it would be super confusing if it did) |
|
Test build #55771 has finished for PR 12362 at commit
|
|
Test build #55783 has finished for PR 12362 at commit
|
| s"Source table in CREATE TABLE LIKE cannot be temporary: '$sourceTable'") | ||
| } | ||
|
|
||
| val tableToCreate = catalog.getTableMetadata(sourceTable).copy( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of doing a copy and change some variables that we need to update, how about we create a new CatalogTable and put the stuff that we need to retain from the source table? Then it's more clear that what we retained for the CREATE TABLE LIKE command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. But I do this to address previous comment. The main concern is that we may add more fields into CatalogTable. If so, we will need to revisit here to add these fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I am neutral on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is that we may add more fields into CatalogTable which we don't want to retain, like lastAccessTime, I'm not sure which way is safer too, cc @andrewor14
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your concern is correct. But I think most fields in source table should be retained. Only few exceptions like lastAccessTime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dilipbiswal and I also hit the same issue when doing the Describe Table. At the end, we decided to avoid adding more fields into CatalogTable. Will follow the discussion in this thread. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think things like lastAccessTime are exceptions. If we don't do copy here then we might forget to copy a new field that we add to CatalogTable.
|
LGTM except one comment. |
| */ | ||
| override def visitCreateTableLike(ctx: CreateTableLikeContext): LogicalPlan = withOrigin(ctx) { | ||
| val targetTable = visitTableIdentifier(ctx.target) | ||
| val sourceTable = visitTableIdentifier(ctx.source) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
much better!
|
LGTM2 |
|
Merged into master, thanks for your work! |
What changes were proposed in this pull request?
JIRA: https://issues.apache.org/jira/browse/SPARK-14592
This patch adds native support for DDL command
CREATE TABLE LIKE.The SQL syntax is like:
How was this patch tested?
HiveDDLCommandSuite.HiveQuerySuitealready testsCREATE TABLE LIKE.