-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17353] [SPARK-16943] [SPARK-16942] [BACKPORT-2.0] [SQL] Fix multiple bugs in CREATE TABLE LIKE command #14946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #64892 has finished for PR 14946 at commit
|
|
It sounds like all the build 2.0 failed the same test case. https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-2.0-test-sbt-hadoop-2.3/ Let me try to fix it. |
|
Test build #64945 has finished for PR 14946 at commit
|
|
Test build #64954 has finished for PR 14946 at commit
|
| name = c.name, | ||
| dataType = c.dataType.catalogString, | ||
| nullable = c.nullable, | ||
| comment = Option(c.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like the existing master build, we removed this useless comment attribute. The major reason is the schema comparison also checks the comment. This is introduced in the PR: #14114
|
cc @cloud-fan @yhuai This PR is ready for review. Thanks! |
…le bugs in CREATE TABLE LIKE command ### What changes were proposed in this pull request? This PR is to backport #14531. The existing `CREATE TABLE LIKE` command has multiple issues: - The generated table is non-empty when the source table is a data source table. The major reason is the data source table is using the table property `path` to store the location of table contents. Currently, we keep it unchanged. Thus, we still create the same table with the same location. - The table type of the generated table is `EXTERNAL` when the source table is an external Hive Serde table. Currently, we explicitly set it to `MANAGED`, but Hive is checking the table property `EXTERNAL` to decide whether the table is `EXTERNAL` or not. (See https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1407-L1408) Thus, the created table is still `EXTERNAL`. - When the source table is a `VIEW`, the metadata of the generated table contains the original view text and view original text. So far, this does not break anything, but it could cause something wrong in Hive. (For example, https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1405-L1406) - The issue regarding the table `comment`. To follow what Hive does, the table comment should be cleaned, but the column comments should be still kept. - The `INDEX` table is not supported. Thus, we should throw an exception in this case. - `owner` should not be retained. `ToHiveTable` set it [here](https://github.com/apache/spark/blob/e679bc3c1cd418ef0025d2ecbc547c9660cac433/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L793) no matter which value we set in `CatalogTable`. We set it to an empty string for avoiding the confusing output in Explain. - Add a support for temp tables - Like Hive, we should not copy the table properties from the source table to the created table, especially for the statistics-related properties, which could be wrong in the created table. - `unsupportedFeatures` should not be copied from the source table. The created table does not have these unsupported features. - When the type of source table is a view, the target table is using the default format of data source tables: `spark.sql.sources.default`. This PR is to fix the above issues. ### How was this patch tested? Improve the test coverage by adding more test cases Author: gatorsmile <gatorsmile@gmail.com> Closes #14946 from gatorsmile/createTableLike20.
|
LGTM, merging to 2.0! |
|
Thanks! |
What changes were proposed in this pull request?
This PR is to backport #14531.
The existing
CREATE TABLE LIKEcommand has multiple issues:pathto store the location of table contents. Currently, we keep it unchanged. Thus, we still create the same table with the same location.EXTERNALwhen the source table is an external Hive Serde table. Currently, we explicitly set it toMANAGED, but Hive is checking the table propertyEXTERNALto decide whether the table isEXTERNALor not. (See https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1407-L1408) Thus, the created table is stillEXTERNAL.VIEW, the metadata of the generated table contains the original view text and view original text. So far, this does not break anything, but it could cause something wrong in Hive. (For example, https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1405-L1406)comment. To follow what Hive does, the table comment should be cleaned, but the column comments should be still kept.INDEXtable is not supported. Thus, we should throw an exception in this case.ownershould not be retained.ToHiveTableset it here no matter which value we set inCatalogTable. We set it to an empty string for avoiding the confusing output in Explain.unsupportedFeaturesshould not be copied from the source table. The created table does not have these unsupported features.spark.sql.sources.default.This PR is to fix the above issues.
How was this patch tested?
Improve the test coverage by adding more test cases