[SPARK-20680][SQL] Spark-sql do not support for void column datatype … #17953

LantaoJin · 2017-05-11T12:03:53Z

What changes were proposed in this pull request?

Spark-sql do not support for void column datatype of view

Create a HIVE view:

hive> create table bad as select 1 x, null z from dual;

Because there's no type, Hive gives it the VOID type:

hive> describe bad;
OK
x int
z void

In Spark2.0.x, the behaviour to read this view is normal:

spark-sql> describe bad;
x int NULL
z void NULL
Time taken: 4.431 seconds, Fetched 2 row(s)

But in Spark2.1.x, it failed with SparkException: Cannot recognize hive type string: void

spark-sql> describe bad;
17/05/09 03:12:08 INFO execution.SparkSqlParser: Parsing command: describe bad
17/05/09 03:12:08 INFO parser.CatalystSqlParser: Parsing command: int
17/05/09 03:12:08 INFO parser.CatalystSqlParser: Parsing command: void
17/05/09 03:12:08 ERROR thriftserver.SparkSQLDriver: Failed in [describe bad]
org.apache.spark.SparkException: Cannot recognize hive type string: void
at org.apache.spark.sql.hive.client.HiveClientImpl.org$apache$spark$sql$hive$client$HiveClientImpl$$fromHiveColumn(HiveClientImpl.scala:789)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:365)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:365)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11.apply(HiveClientImpl.scala:365)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11.apply(HiveClientImpl.scala:361)
Caused by: org.apache.spark.sql.catalyst.parser.ParseException:
DataType void() is not supported.(line 1, pos 0)
== SQL ==
void
^^^
... 61 more
org.apache.spark.SparkException: Cannot recognize hive type string: void

How was this patch tested?

Add tests

Also can manual tests

spark-sql> describe bad;
x int NULL
z null NULL
Time taken: 0.486 seconds, Fetched 2 row(s)

hvanhovell · 2017-05-11T22:33:26Z

ok to test

gatorsmile · 2017-05-11T23:42:56Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

This change really resolves your issue?

Apparently Hive can have null typed columns. So this should be the location where you'd want to change this.

Hive 2.x disables it. Could you add some test cases by reading and writing the tables with void types? Thanks!

+1 for the test case.

SparkQA · 2017-05-12T00:49:03Z

Test build #76834 has finished for PR 17953 at commit d14fc41.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2017-05-12T01:36:17Z

@LantaoJin Can you add a description and a test case for this? You can take a look at the OrcSourceSuite to get an idea how to work with Hive.

SparkQA · 2017-05-12T12:57:53Z

Test build #76866 has finished for PR 17953 at commit 2fee1e0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-05-14T04:58:42Z

Are your test scenario is like?

    withTable("t", "tabNullType") {
      val client = spark.sharedState.externalCatalog.asInstanceOf[HiveExternalCatalog].client
      client.runSqlHive("CREATE TABLE t (t1 int)")
      client.runSqlHive("INSERT INTO t VALUES (3)")
      client.runSqlHive("CREATE TABLE tabNullType AS SELECT NULL AS col FROM t")
      spark.table("tabNullType").show()
      spark.table("tabNullType").printSchema()
    }

Is this what you want?

LantaoJin · 2017-05-15T14:35:37Z

@gatorsmile Yes, it's the right test scenario. Which class should I add to?

gatorsmile · 2017-05-15T21:40:04Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

Could you add a comment to explain this specific scenario?

gatorsmile · 2017-05-15T21:41:13Z

Maybe HiveDDLSuite?

LantaoJin · 2017-05-16T14:18:59Z

Add a test in HiveDDLSuite. Please review, thanks.

SparkQA · 2017-05-16T15:21:33Z

Test build #76968 has finished for PR 17953 at commit 4dd5f54.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-05-16T18:05:47Z

After this PR, we can describe it, but the query results are still empty.

LantaoJin · 2017-05-17T03:34:00Z

Thanks, I add a row to the table t and now we can do nonEmpty check on table tabNullType.

SparkQA · 2017-05-17T03:34:27Z

Test build #77001 has finished for PR 17953 at commit 524b8b5.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-05-17T04:21:48Z

Test build #77000 has finished for PR 17953 at commit efb07df.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-05-17T05:08:22Z

Test build #77002 has finished for PR 17953 at commit a56787d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

LantaoJin · 2017-05-17T05:25:11Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala

From the Jenkins, I saw below error message:

org.scalatest.exceptions.TestFailedException: StructType(StructField(col,NullType,true)) did not contain StructField(col,NullType,true)

But it can passed from my spark-shell:

scala> val schema = spark.table("tabNullType").schema
schema: org.apache.spark.sql.types.StructType = StructType(StructField(col,NullType,true))
scala> schema.contains(StructField("col", NullType))
res7: Boolean = true

cloud-fan · 2017-05-17T14:55:31Z

I don't think we should support void type in the parser, CREATE TABLE t(a void) should still be illegal.

SparkQA · 2017-05-20T05:07:37Z

Test build #77115 has started for PR 17953 at commit 07713a9.

LantaoJin · 2017-05-20T05:08:23Z

Thanks @cloud-fan . Hi @hvanhovell and @gatorsmile , any ideas?

cloud-fan · 2017-05-21T14:11:52Z

retest this please

SparkQA · 2017-05-21T16:40:38Z

Test build #77150 has finished for PR 17953 at commit 07713a9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-05-22T06:27:52Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala

-> checkAnswer(spark.table("tabNullType"), Row(null))?

gatorsmile · 2017-05-22T06:30:35Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala

In class NullType, we can add the following line:

override def simpleString: String = "void"

After the above change, you can improve the test to

val desc = sql("DESC tabNullType").collect().toSeq assert(desc.contains(Row("col", "void", null)))

gatorsmile · 2017-05-22T06:34:16Z

LGTM except two comments.

gatorsmile · 2017-05-29T18:48:35Z

ping @LantaoJin

LantaoJin · 2017-06-03T03:10:16Z

Thanks @gatorsmile , I took a vacation last week. Will update it ASAP.

…of view

dongjoon-hyun · 2017-06-03T15:52:23Z

Retest this please

gatorsmile · 2017-06-04T17:17:19Z

retest this please

gatorsmile · 2017-06-04T17:17:25Z

ok to test

cloud-fan · 2017-06-04T18:02:35Z

shall we add a test case for CREATE TABLE t(a void) to make sure it still fails?

cloud-fan · 2017-06-04T18:04:37Z

I think a safer fix is to just handle "void" specially in HiveClientImpl.fromHiveColumn

cloud-fan · 2017-06-04T18:09:13Z

oh actually users can still create a table with null type column via Catalog.createTable, seems it's ok to support "void" in our parser, but add a new rule to throw exception if users wanna create a table with null type column(exclude CTAS).

SparkQA · 2017-06-04T19:04:20Z

Test build #77723 has finished for PR 17953 at commit fda353f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

LantaoJin · 2017-06-05T12:37:53Z

All failed tests due to mis-match of *.sql.out with the new "void" simple string of NullType.
How to re-generate them? I see "Automatically generated by SQLQueryTestSuite" in the first line of the out files. If I manually modify them, it can be passed by.

spark-sql/src/test/resources/sql-tests/results/*.sql.out

LantaoJin · 2017-06-06T03:38:34Z

Ahh, found it. Re-generated the golden files.

LantaoJin · 2017-06-06T04:04:01Z

@cloud-fan Do you think it should be done in this pull? And where should add the filter, CalalogImpl.createTable() or ExternalCatalog.createTable()

SparkQA · 2017-06-06T06:11:47Z

Test build #77766 has finished for PR 17953 at commit 1e86674.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-06-08T18:41:05Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala

+      val client = spark.sharedState.externalCatalog.asInstanceOf[HiveExternalCatalog].client
+      client.runSqlHive("CREATE TABLE t (t1 int)")
+      client.runSqlHive("INSERT INTO t VALUES (3)")
+      client.runSqlHive("CREATE TABLE tabNullType AS SELECT NULL AS col FROM t")


IIRC, hive 2 does't support this. Let's test with CREATE VIEW AS ... to be safer

HyukjinKwon · 2017-07-24T03:39:51Z

@LantaoJin do you have some time to address the review comment above?

LantaoJin · 2017-07-24T05:51:40Z

@HyukjinKwon Sure. Thank you for reminding me. I almost forgot it.

gatorsmile · 2017-10-27T23:39:10Z

@LantaoJin Maybe close it now? You can reopen it when the comment is resolved?

LantaoJin · 2017-10-28T04:21:52Z

Sure, please close it as you wish. I will reopen it when it is ready for up to date. Sent from Mail Master On 10/28/2017 07:39, Xiao Li wrote: @LantaoJin Maybe close it now? You can reopen it when the comment is resolved? —You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread. {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/apache/spark","title":"apache/spark","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/apache/spark"}},"updates":{"snippets":[{"icon":"PERSON","message":"@gatorsmile in #17953: @LantaoJin Maybe close it now? You can reopen it when the comment is resolved?"}],"action":{"name":"View Pull Request","url":"#17953 (comment)"}}}

amit-hitachi · 2020-06-09T06:44:22Z

@LantaoJin @gatorsmile
Any plans to merge this fix?

maropu · 2020-06-09T06:54:25Z

@amit-hitachi I think we don't have any plan for this work. But, you could revive this discussion in the corresponding jira side.

HyukjinKwon · 2020-06-15T05:48:23Z

Yeah .. I personally support this change FWIW.

LantaoJin · 2020-06-15T08:40:49Z

Emmm. How to reopen it?

LantaoJin · 2020-06-15T08:50:23Z

I open a new one #28833

…oid column datatype ### What changes were proposed in this pull request? This is the new PR which to address the close one #17953 1. support "void" primitive data type in the `AstBuilder`, point it to `NullType` 2. forbid creating tables with VOID/NULL column type ### Why are the changes needed? 1. Spark is incompatible with hive void type. When Hive table schema contains void type, DESC table will throw an exception in Spark. >hive> create table bad as select 1 x, null z from dual; >hive> describe bad; OK x int z void In Spark2.0.x, the behaviour to read this view is normal: >spark-sql> describe bad; x int NULL z void NULL Time taken: 4.431 seconds, Fetched 2 row(s) But in lastest Spark version, it failed with SparkException: Cannot recognize hive type string: void >spark-sql> describe bad; 17/05/09 03:12:08 ERROR thriftserver.SparkSQLDriver: Failed in [describe bad] org.apache.spark.SparkException: Cannot recognize hive type string: void Caused by: org.apache.spark.sql.catalyst.parser.ParseException: DataType void() is not supported.(line 1, pos 0) == SQL == void ^^^ ... 61 more org.apache.spark.SparkException: Cannot recognize hive type string: void 2. Hive CTAS statements throws error when select clause has NULL/VOID type column since HIVE-11217 In Spark, creating table with a VOID/NULL column should throw readable exception message, include - create data source table (using parquet, json, ...) - create hive table (with or without stored as) - CTAS ### Does this PR introduce any user-facing change? No ### How was this patch tested? Add unit tests Closes #28833 from LantaoJin/SPARK-20680_COPY. Authored-by: LantaoJin <jinlantao@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

Closes apache#11494 Closes apache#14158 Closes apache#16803 Closes apache#16864 Closes apache#17455 Closes apache#17936 Closes apache#19377 Added: Closes apache#19380 Closes apache#18642 Closes apache#18377 Closes apache#19632 Added: Closes apache#14471 Closes apache#17402 Closes apache#17953 Closes apache#18607 Also cc srowen vanzin HyukjinKwon gatorsmile cloud-fan to see if you have other PRs to close. Author: Xingbo Jiang <xingbo.jiang@databricks.com> Closes apache#19669 from jiangxb1987/stale-prs.

gatorsmile reviewed May 11, 2017

View reviewed changes

gatorsmile reviewed May 15, 2017

View reviewed changes

LantaoJin force-pushed the SPARK-20680 branch from 524b8b5 to a56787d Compare May 17, 2017 03:41

LantaoJin commented May 17, 2017

View reviewed changes

LantaoJin force-pushed the SPARK-20680 branch from a56787d to 07713a9 Compare May 20, 2017 05:05

gatorsmile reviewed May 22, 2017

View reviewed changes

[SPARK-20680][SQL] Spark-sql do not support for void column datatype …

fda353f

…of view

[SPARK-20680][SQL] re-generate golden files

1e86674

cloud-fan reviewed Jun 8, 2017

View reviewed changes

HyukjinKwon mentioned this pull request Nov 6, 2017

[BUILD] Close stale PRs #19669

Closed

asfgit closed this in ed1478c Nov 7, 2017

ulysses-you mentioned this pull request Jul 9, 2019

[SPARK-28313][SQL] Spark sql null type incompatible with hive void type #25085

Closed

LantaoJin mentioned this pull request Jun 15, 2020

[SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype #28833

Closed

LantaoJin mentioned this pull request Jun 28, 2020

[SPARK-20680][SQL] Adding HiveVoidType in Spark to be compatible with Hive #28935

Closed

[SPARK-20680][SQL] Spark-sql do not support for void column datatype … #17953

[SPARK-20680][SQL] Spark-sql do not support for void column datatype … #17953

Uh oh!

Conversation

LantaoJin commented May 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

hvanhovell commented May 11, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile May 12, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented May 12, 2017

Uh oh!

hvanhovell commented May 12, 2017

Uh oh!

SparkQA commented May 12, 2017

Uh oh!

gatorsmile commented May 14, 2017

Uh oh!

LantaoJin commented May 15, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented May 15, 2017

Uh oh!

LantaoJin commented May 16, 2017

Uh oh!

SparkQA commented May 16, 2017

Uh oh!

gatorsmile commented May 16, 2017

Uh oh!

LantaoJin commented May 17, 2017

Uh oh!

SparkQA commented May 17, 2017

Uh oh!

SparkQA commented May 17, 2017

Uh oh!

SparkQA commented May 17, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented May 17, 2017

Uh oh!

SparkQA commented May 20, 2017

Uh oh!

LantaoJin commented May 20, 2017

Uh oh!

cloud-fan commented May 21, 2017

Uh oh!

SparkQA commented May 21, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented May 22, 2017

Uh oh!

gatorsmile commented May 29, 2017

Uh oh!

LantaoJin commented Jun 3, 2017

Uh oh!

dongjoon-hyun commented Jun 3, 2017

Uh oh!

gatorsmile commented Jun 4, 2017

Uh oh!

gatorsmile commented Jun 4, 2017

LantaoJin commented May 11, 2017 •

edited

Loading

gatorsmile May 12, 2017 •

edited

Loading

LantaoJin commented Jun 5, 2017 •

edited

Loading