Skip to content

Conversation

@LantaoJin
Copy link
Contributor

@LantaoJin LantaoJin commented May 11, 2017

What changes were proposed in this pull request?

Spark-sql do not support for void column datatype of view

Create a HIVE view:

hive> create table bad as select 1 x, null z from dual;

Because there's no type, Hive gives it the VOID type:

hive> describe bad;
OK
x int
z void

In Spark2.0.x, the behaviour to read this view is normal:

spark-sql> describe bad;
x int NULL
z void NULL
Time taken: 4.431 seconds, Fetched 2 row(s)

But in Spark2.1.x, it failed with SparkException: Cannot recognize hive type string: void

spark-sql> describe bad;
17/05/09 03:12:08 INFO execution.SparkSqlParser: Parsing command: describe bad
17/05/09 03:12:08 INFO parser.CatalystSqlParser: Parsing command: int
17/05/09 03:12:08 INFO parser.CatalystSqlParser: Parsing command: void
17/05/09 03:12:08 ERROR thriftserver.SparkSQLDriver: Failed in [describe bad]
org.apache.spark.SparkException: Cannot recognize hive type string: void
at org.apache.spark.sql.hive.client.HiveClientImpl.org$apache$spark$sql$hive$client$HiveClientImpl$$fromHiveColumn(HiveClientImpl.scala:789)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:365)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:365)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11.apply(HiveClientImpl.scala:365)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11.apply(HiveClientImpl.scala:361)
Caused by: org.apache.spark.sql.catalyst.parser.ParseException:
DataType void() is not supported.(line 1, pos 0)
== SQL ==
void
^^^
... 61 more
org.apache.spark.SparkException: Cannot recognize hive type string: void

How was this patch tested?

Add tests

Also can manual tests

spark-sql> describe bad;
x int NULL
z null NULL
Time taken: 0.486 seconds, Fetched 2 row(s)

@hvanhovell
Copy link
Contributor

ok to test

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change really resolves your issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently Hive can have null typed columns. So this should be the location where you'd want to change this.

Copy link
Member

@gatorsmile gatorsmile May 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hive 2.x disables it. Could you add some test cases by reading and writing the tables with void types? Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for the test case.

@SparkQA
Copy link

SparkQA commented May 12, 2017

Test build #76834 has finished for PR 17953 at commit d14fc41.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor

@LantaoJin Can you add a description and a test case for this? You can take a look at the OrcSourceSuite to get an idea how to work with Hive.

@SparkQA
Copy link

SparkQA commented May 12, 2017

Test build #76866 has finished for PR 17953 at commit 2fee1e0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Are your test scenario is like?

    withTable("t", "tabNullType") {
      val client = spark.sharedState.externalCatalog.asInstanceOf[HiveExternalCatalog].client
      client.runSqlHive("CREATE TABLE t (t1 int)")
      client.runSqlHive("INSERT INTO t VALUES (3)")
      client.runSqlHive("CREATE TABLE tabNullType AS SELECT NULL AS col FROM t")
      spark.table("tabNullType").show()
      spark.table("tabNullType").printSchema()
    }

Is this what you want?

@LantaoJin
Copy link
Contributor Author

@gatorsmile Yes, it's the right test scenario. Which class should I add to?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment to explain this specific scenario?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@gatorsmile
Copy link
Member

Maybe HiveDDLSuite?

@LantaoJin
Copy link
Contributor Author

Add a test in HiveDDLSuite. Please review, thanks.

@SparkQA
Copy link

SparkQA commented May 16, 2017

Test build #76968 has finished for PR 17953 at commit 4dd5f54.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

After this PR, we can describe it, but the query results are still empty.

@LantaoJin
Copy link
Contributor Author

Thanks, I add a row to the table t and now we can do nonEmpty check on table tabNullType.

@SparkQA
Copy link

SparkQA commented May 17, 2017

Test build #77001 has finished for PR 17953 at commit 524b8b5.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 17, 2017

Test build #77000 has finished for PR 17953 at commit efb07df.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 17, 2017

Test build #77002 has finished for PR 17953 at commit a56787d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the Jenkins, I saw below error message:

org.scalatest.exceptions.TestFailedException: StructType(StructField(col,NullType,true)) did not contain StructField(col,NullType,true)

But it can passed from my spark-shell:

scala> val schema = spark.table("tabNullType").schema
schema: org.apache.spark.sql.types.StructType = StructType(StructField(col,NullType,true))
scala> schema.contains(StructField("col", NullType))
res7: Boolean = true

@cloud-fan
Copy link
Contributor

I don't think we should support void type in the parser, CREATE TABLE t(a void) should still be illegal.

@SparkQA
Copy link

SparkQA commented May 20, 2017

Test build #77115 has started for PR 17953 at commit 07713a9.

@LantaoJin
Copy link
Contributor Author

Thanks @cloud-fan . Hi @hvanhovell and @gatorsmile , any ideas?

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented May 21, 2017

Test build #77150 has finished for PR 17953 at commit 07713a9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> checkAnswer(spark.table("tabNullType"), Row(null))?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In class NullType, we can add the following line:

override def simpleString: String = "void"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the above change, you can improve the test to

      val desc = sql("DESC tabNullType").collect().toSeq
      assert(desc.contains(Row("col", "void", null)))

@gatorsmile
Copy link
Member

LGTM except two comments.

@gatorsmile
Copy link
Member

ping @LantaoJin

@LantaoJin
Copy link
Contributor Author

Thanks @gatorsmile , I took a vacation last week. Will update it ASAP.

@dongjoon-hyun
Copy link
Member

Retest this please

@gatorsmile
Copy link
Member

retest this please

@gatorsmile
Copy link
Member

ok to test

@cloud-fan
Copy link
Contributor

shall we add a test case for CREATE TABLE t(a void) to make sure it still fails?

@cloud-fan
Copy link
Contributor

I think a safer fix is to just handle "void" specially in HiveClientImpl.fromHiveColumn

@cloud-fan
Copy link
Contributor

oh actually users can still create a table with null type column via Catalog.createTable, seems it's ok to support "void" in our parser, but add a new rule to throw exception if users wanna create a table with null type column(exclude CTAS).

@SparkQA
Copy link

SparkQA commented Jun 4, 2017

Test build #77723 has finished for PR 17953 at commit fda353f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@LantaoJin
Copy link
Contributor Author

LantaoJin commented Jun 5, 2017

All failed tests due to mis-match of *.sql.out with the new "void" simple string of NullType.
How to re-generate them? I see "Automatically generated by SQLQueryTestSuite" in the first line of the out files. If I manually modify them, it can be passed by.

spark-sql/src/test/resources/sql-tests/results/*.sql.out

@LantaoJin
Copy link
Contributor Author

Ahh, found it. Re-generated the golden files.

@LantaoJin
Copy link
Contributor Author

@cloud-fan Do you think it should be done in this pull? And where should add the filter, CalalogImpl.createTable() or ExternalCatalog.createTable()

@SparkQA
Copy link

SparkQA commented Jun 6, 2017

Test build #77766 has finished for PR 17953 at commit 1e86674.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val client = spark.sharedState.externalCatalog.asInstanceOf[HiveExternalCatalog].client
client.runSqlHive("CREATE TABLE t (t1 int)")
client.runSqlHive("INSERT INTO t VALUES (3)")
client.runSqlHive("CREATE TABLE tabNullType AS SELECT NULL AS col FROM t")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, hive 2 does't support this. Let's test with CREATE VIEW AS ... to be safer

@HyukjinKwon
Copy link
Member

@LantaoJin do you have some time to address the review comment above?

@LantaoJin
Copy link
Contributor Author

@HyukjinKwon Sure. Thank you for reminding me. I almost forgot it.

@gatorsmile
Copy link
Member

@LantaoJin Maybe close it now? You can reopen it when the comment is resolved?

@LantaoJin
Copy link
Contributor Author

LantaoJin commented Oct 28, 2017 via email

@amit-hitachi
Copy link

@LantaoJin @gatorsmile
Any plans to merge this fix?

@maropu
Copy link
Member

maropu commented Jun 9, 2020

@amit-hitachi I think we don't have any plan for this work. But, you could revive this discussion in the corresponding jira side.

@HyukjinKwon
Copy link
Member

Yeah .. I personally support this change FWIW.

@LantaoJin
Copy link
Contributor Author

Emmm. How to reopen it?

@LantaoJin
Copy link
Contributor Author

I open a new one #28833

dongjoon-hyun pushed a commit that referenced this pull request Jul 8, 2020
…oid column datatype

### What changes were proposed in this pull request?

This is the new PR which to address the close one #17953

1. support "void" primitive data type in the `AstBuilder`, point it to `NullType`
2. forbid creating tables with VOID/NULL column type

### Why are the changes needed?

1. Spark is incompatible with hive void type. When Hive table schema contains void type, DESC table will throw an exception in Spark.

>hive> create table bad as select 1 x, null z from dual;
>hive> describe bad;
OK
x	int
z	void

In Spark2.0.x, the behaviour to read this view is normal:
>spark-sql> describe bad;
x       int     NULL
z       void    NULL
Time taken: 4.431 seconds, Fetched 2 row(s)

But in lastest Spark version, it failed with SparkException: Cannot recognize hive type string: void

>spark-sql> describe bad;
17/05/09 03:12:08 ERROR thriftserver.SparkSQLDriver: Failed in [describe bad]
org.apache.spark.SparkException: Cannot recognize hive type string: void
Caused by: org.apache.spark.sql.catalyst.parser.ParseException:
DataType void() is not supported.(line 1, pos 0)
== SQL ==
void
^^^
        ... 61 more
org.apache.spark.SparkException: Cannot recognize hive type string: void

2. Hive CTAS statements throws error when select clause has NULL/VOID type column since HIVE-11217
In Spark, creating table with a VOID/NULL column should throw readable exception message, include

- create data source table (using parquet, json, ...)
- create hive table (with or without stored as)
- CTAS

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Add unit tests

Closes #28833 from LantaoJin/SPARK-20680_COPY.

Authored-by: LantaoJin <jinlantao@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 22, 2025
Closes apache#11494
Closes apache#14158
Closes apache#16803
Closes apache#16864
Closes apache#17455
Closes apache#17936
Closes apache#19377

Added:
Closes apache#19380
Closes apache#18642
Closes apache#18377
Closes apache#19632

Added:
Closes apache#14471
Closes apache#17402
Closes apache#17953
Closes apache#18607

Also cc srowen vanzin HyukjinKwon gatorsmile cloud-fan to see if you have other PRs to close.

Author: Xingbo Jiang <xingbo.jiang@databricks.com>

Closes apache#19669 from jiangxb1987/stale-prs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants