-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-16185] [SQL] Better Error Messages When Creating Table As Select Without Enabling Hive Support #13886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-16185] [SQL] Better Error Messages When Creating Table As Select Without Enabling Hive Support #13886
Conversation
|
Test build #61159 has finished for PR 13886 at commit
|
|
Test build #61158 has finished for PR 13886 at commit
|
|
Test build #61180 has finished for PR 13886 at commit
|
|
cc @cloud-fan @hvanhovell could you please review this simple change? Thanks! |
|
hmmm, why can't we support CREATE TABLE AS SELECT without hive support? |
|
CREATE TABLE AS SELECT can be converted to Create Data Source Table when the following condition is true: However, the default value of internal Conf |
|
@cloud-fan Submitted a PR (#13907) for converting CTAS in parquet to data source tables without hive support. Could you also review that PR? Thanks! |
|
retest this please |
|
Test build #61450 has finished for PR 13886 at commit
|
|
Could you please review this PR again? @cloud-fan Thanks! |
|
retest this please |
|
Test build #61771 has finished for PR 13886 at commit
|
|
Test build #62827 has finished for PR 13886 at commit
|
|
cc @cloud-fan This is not contained in #14482. Should I leave it open? or you will merge it into the PR? |
| s"$numStaticPartitions partition column(s) having constant value(s).") | ||
| } | ||
|
|
||
| case c if c.getClass.getName == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after #14482 , we can have a un-hacky way to handle it, can you update?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will fix it soon. Thanks!
|
Test build #63365 has finished for PR 13886 at commit
|
| def apply(plan: LogicalPlan): Unit = { | ||
| plan.foreach { | ||
| case CreateTable(tableDesc, _, Some(_)) | ||
| if tableDesc.provider.get == "hive" && sparkConf.get(CATALOG_IMPLEMENTATION) != "hive" => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I just realized this can't be backported to 2.0.
For master, I'd like to implement it directly. Then we can think about how to deal with 2.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. That sounds reasonable to me. Do you think we can create a data source table in this specific case? If so, I can submit another PR for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we already do? The problem is APPEND mode, we need to read schema of the existing hive table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you are right. We did it for a few cases, but we are not able to handle all the cases. See the following code:
spark/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
Lines 1017 to 1030 in 5959df2
| val hasStorageProperties = (ctx.createFileFormat != null) || (ctx.rowFormat != null) | |
| if (conf.convertCTAS && !hasStorageProperties) { | |
| // At here, both rowStorage.serdeProperties and fileStorage.serdeProperties | |
| // are empty Maps. | |
| val optionsWithPath = if (location.isDefined) { | |
| Map("path" -> location.get) | |
| } else { | |
| Map.empty[String, String] | |
| } | |
| val newTableDesc = tableDesc.copy( | |
| storage = CatalogStorageFormat.empty.copy(properties = optionsWithPath), | |
| provider = Some(conf.defaultDataSourceName) | |
| ) |
We can extend it to support more cases, like another PR: #13907. However, this sounds like we are unable to support all the cases when Hive is not enabled.
Regarding the issue you mentioned above, (reading the schema from the existing Hive table and then append to another table), there are two cases:
- When Hive support is enabled, another fix of the PR [SPARK-16803] [SQL] SaveAsTable does not work when source DataFrame is built on a Hive Table #14410 is trying to resolve it. Will submit it soon.
- When Hive support is not enabled, reading Hive metastore is impossible. Thus, should we still issue an exception?
Sorry, this also looks very confusing to me. Hopefully, my description is a little bit clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah sorry I misread this PR. When Hive support is not enabled, we definitely need to throw exception for hive tables in CTAS.
|
We can create a new rule for this kind of check and put it only in sql module, i.e. |
|
Test build #63495 has finished for PR 13886 at commit
|
| */ | ||
| object HiveOnlyCheck extends (LogicalPlan => Unit) { | ||
| def apply(plan: LogicalPlan): Unit = { | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put it in class not in method
| "WITH SERDEPROPERTIES ('spark.sql.sources.me'='anything')") | ||
| } | ||
|
|
||
| test("Create Cataloged Table As Select") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hive table?
|
Can you update the PR description? I think it should throw exception in planner before your PR. |
|
Sure, let me fix them now. Thanks! |
|
LGTM, pending jenkins, thanks for working on it! |
|
Test build #63510 has finished for PR 13886 at commit
|
| } | ||
|
|
||
| test("Create Hive Table As Select") { | ||
| import testImplicits._ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: import testImplicits._ is used in many test cases in DDLSuite. We can import it only once.
|
thanks, merging to master! |
What changes were proposed in this pull request?
When we do not turn on the Hive Support, the following query generates a confusing error message by Planner:
sql("CREATE TABLE t2 SELECT a, b from t1")This PR is to issue a better error message:
How was this patch tested?
Added test cases in
DDLSuite.scala