-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-20680][SQL] Spark-sql do not support for void column datatype … #17953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change really resolves your issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently Hive can have null typed columns. So this should be the location where you'd want to change this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hive 2.x disables it. Could you add some test cases by reading and writing the tables with void types? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for the test case.
|
Test build #76834 has finished for PR 17953 at commit
|
|
@LantaoJin Can you add a description and a test case for this? You can take a look at the OrcSourceSuite to get an idea how to work with Hive. |
|
Test build #76866 has finished for PR 17953 at commit
|
|
Are your test scenario is like? withTable("t", "tabNullType") {
val client = spark.sharedState.externalCatalog.asInstanceOf[HiveExternalCatalog].client
client.runSqlHive("CREATE TABLE t (t1 int)")
client.runSqlHive("INSERT INTO t VALUES (3)")
client.runSqlHive("CREATE TABLE tabNullType AS SELECT NULL AS col FROM t")
spark.table("tabNullType").show()
spark.table("tabNullType").printSchema()
}Is this what you want? |
|
@gatorsmile Yes, it's the right test scenario. Which class should I add to? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment to explain this specific scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
|
Maybe |
|
Add a test in HiveDDLSuite. Please review, thanks. |
|
Test build #76968 has finished for PR 17953 at commit
|
|
After this PR, we can describe it, but the query results are still empty. |
|
Thanks, I add a row to the table |
|
Test build #77001 has finished for PR 17953 at commit
|
|
Test build #77000 has finished for PR 17953 at commit
|
|
Test build #77002 has finished for PR 17953 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the Jenkins, I saw below error message:
org.scalatest.exceptions.TestFailedException: StructType(StructField(col,NullType,true)) did not contain StructField(col,NullType,true)
But it can passed from my spark-shell:
scala> val schema = spark.table("tabNullType").schema
schema: org.apache.spark.sql.types.StructType = StructType(StructField(col,NullType,true))
scala> schema.contains(StructField("col", NullType))
res7: Boolean = true
|
I don't think we should support |
|
Test build #77115 has started for PR 17953 at commit |
|
Thanks @cloud-fan . Hi @hvanhovell and @gatorsmile , any ideas? |
|
retest this please |
|
Test build #77150 has finished for PR 17953 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> checkAnswer(spark.table("tabNullType"), Row(null))?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In class NullType, we can add the following line:
override def simpleString: String = "void"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the above change, you can improve the test to
val desc = sql("DESC tabNullType").collect().toSeq
assert(desc.contains(Row("col", "void", null)))|
LGTM except two comments. |
|
ping @LantaoJin |
|
Thanks @gatorsmile , I took a vacation last week. Will update it ASAP. |
|
Retest this please |
|
retest this please |
|
ok to test |
|
shall we add a test case for |
|
I think a safer fix is to just handle "void" specially in |
|
oh actually users can still create a table with null type column via |
|
Test build #77723 has finished for PR 17953 at commit
|
|
All failed tests due to mis-match of *.sql.out with the new "void" simple string of NullType.
|
|
Ahh, found it. Re-generated the golden files. |
|
@cloud-fan Do you think it should be done in this pull? And where should add the filter, |
|
Test build #77766 has finished for PR 17953 at commit
|
| val client = spark.sharedState.externalCatalog.asInstanceOf[HiveExternalCatalog].client | ||
| client.runSqlHive("CREATE TABLE t (t1 int)") | ||
| client.runSqlHive("INSERT INTO t VALUES (3)") | ||
| client.runSqlHive("CREATE TABLE tabNullType AS SELECT NULL AS col FROM t") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, hive 2 does't support this. Let's test with CREATE VIEW AS ... to be safer
|
@LantaoJin do you have some time to address the review comment above? |
|
@HyukjinKwon Sure. Thank you for reminding me. I almost forgot it. |
|
@LantaoJin Maybe close it now? You can reopen it when the comment is resolved? |
|
Sure, please close it as you wish. I will reopen it when it is ready for up to date.
Sent from Mail Master
On 10/28/2017 07:39, Xiao Li wrote: @LantaoJin Maybe close it now? You can reopen it when the comment is resolved?
—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread.
{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/apache/spark","title":"apache/spark","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/apache/spark"}},"updates":{"snippets":[{"icon":"PERSON","message":"@gatorsmile in #17953: @LantaoJin Maybe close it now? You can reopen it when the comment is resolved?"}],"action":{"name":"View Pull Request","url":"#17953 (comment)"}}}
|
|
@LantaoJin @gatorsmile |
|
@amit-hitachi I think we don't have any plan for this work. But, you could revive this discussion in the corresponding jira side. |
|
Yeah .. I personally support this change FWIW. |
|
Emmm. How to reopen it? |
|
I open a new one #28833 |
…oid column datatype ### What changes were proposed in this pull request? This is the new PR which to address the close one #17953 1. support "void" primitive data type in the `AstBuilder`, point it to `NullType` 2. forbid creating tables with VOID/NULL column type ### Why are the changes needed? 1. Spark is incompatible with hive void type. When Hive table schema contains void type, DESC table will throw an exception in Spark. >hive> create table bad as select 1 x, null z from dual; >hive> describe bad; OK x int z void In Spark2.0.x, the behaviour to read this view is normal: >spark-sql> describe bad; x int NULL z void NULL Time taken: 4.431 seconds, Fetched 2 row(s) But in lastest Spark version, it failed with SparkException: Cannot recognize hive type string: void >spark-sql> describe bad; 17/05/09 03:12:08 ERROR thriftserver.SparkSQLDriver: Failed in [describe bad] org.apache.spark.SparkException: Cannot recognize hive type string: void Caused by: org.apache.spark.sql.catalyst.parser.ParseException: DataType void() is not supported.(line 1, pos 0) == SQL == void ^^^ ... 61 more org.apache.spark.SparkException: Cannot recognize hive type string: void 2. Hive CTAS statements throws error when select clause has NULL/VOID type column since HIVE-11217 In Spark, creating table with a VOID/NULL column should throw readable exception message, include - create data source table (using parquet, json, ...) - create hive table (with or without stored as) - CTAS ### Does this PR introduce any user-facing change? No ### How was this patch tested? Add unit tests Closes #28833 from LantaoJin/SPARK-20680_COPY. Authored-by: LantaoJin <jinlantao@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Closes apache#11494 Closes apache#14158 Closes apache#16803 Closes apache#16864 Closes apache#17455 Closes apache#17936 Closes apache#19377 Added: Closes apache#19380 Closes apache#18642 Closes apache#18377 Closes apache#19632 Added: Closes apache#14471 Closes apache#17402 Closes apache#17953 Closes apache#18607 Also cc srowen vanzin HyukjinKwon gatorsmile cloud-fan to see if you have other PRs to close. Author: Xingbo Jiang <xingbo.jiang@databricks.com> Closes apache#19669 from jiangxb1987/stale-prs.
What changes were proposed in this pull request?
Spark-sql do not support for void column datatype of view
Create a HIVE view:
Because there's no type, Hive gives it the VOID type:
In Spark2.0.x, the behaviour to read this view is normal:
But in Spark2.1.x, it failed with SparkException: Cannot recognize hive type string: void
How was this patch tested?
Add tests
Also can manual tests