Skip to content

Conversation

@ulysses-you
Copy link
Contributor

@ulysses-you ulysses-you commented Jul 26, 2020

What changes were proposed in this pull request?

For better compatible with Hive in create view with null.

Change NullType.simpleString to void, then Hive can parse void to it's own type.

Why are the changes needed?

Currently, we support create view v as select null with in-memory but failed in hive. We try to forbid this in #29152, however df.createTempView is already a use case of NullType with in-memory. So it seems better to keep this behavior consistent with hive.

Some error msg with create view v as select null.

org.apache.spark.SparkException: Cannot recognize hive type string: null, column: NULL
	at org.apache.spark.sql.hive.client.HiveClientImpl$.getSparkSQLDataType(HiveClientImpl.scala:999)
	at org.apache.spark.sql.hive.client.HiveClientImpl$.$anonfun$verifyColumnDataType$1(HiveClientImpl.scala:1021)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
	at scala.collection.IterableLike.foreach(IterableLike.scala:74)
	at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
	at org.apache.spark.sql.types.StructType.foreach(StructType.scala:102)

Does this PR introduce any user-facing change?

Yes, user can create view with null type in Hive.

How was this patch tested?

New UT with just Hive. in-memory test is already exists.

@SparkQA
Copy link

SparkQA commented Jul 26, 2020

Test build #126572 has finished for PR 29244 at commit 2fea01c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

c.metadata.getString(HIVE_TYPE_STRING)
} else {
c.dataType.catalogString
c.dataType.sql
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should use .sql to hive type.

@SparkQA
Copy link

SparkQA commented Jul 27, 2020

Test build #126594 has finished for PR 29244 at commit 9b12177.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


private[spark] override def asNullable: NullType = this

override def sql: String = "VOID"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not override simpleString? sql calls simpleString

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just consider that simpleString/catalogString is a name of spark, and sql is used in other database ? We can still show something like unknown with NullType, If we can divide this.

I'm not sure the sql original meaning, but it seems used for Hive DDL e.g., StructType.toDDL use sql to build DDL string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's confusing if simpleString and sql is not consistent.

@LantaoJin
Copy link
Contributor

LantaoJin commented Jul 27, 2020

Could we change this title to a FOLLOWUP of SPARK-20680, like #29041. It can help backport more clear

@ulysses-you
Copy link
Contributor Author

@LantaoJin mostly we can, but I'm not sure whether this pr is only for revert simpleString.

@ulysses-you ulysses-you changed the title [SPARK-32445][SQL] Make NullType.sql as VOID to support hive [SPARK-32445][SQL] Make NullType.simpleString as VOID to support hive Jul 27, 2020
@ulysses-you ulysses-you changed the title [SPARK-32445][SQL] Make NullType.simpleString as VOID to support hive [SPARK-32445][SQL] Make NullType.simpleString as void to support hive Jul 27, 2020
@ulysses-you ulysses-you changed the title [SPARK-32445][SQL] Make NullType.simpleString as void to support hive [SPARK-20680][SQL][FOLLOW-UP] Make NullType.simpleString as void to support hive Jul 27, 2020
case ("decimal" | "dec" | "numeric", precision :: scale :: Nil) =>
DecimalType(precision.getText.toInt, scale.getText.toInt)
case ("void", Nil) => NullType
case ("null", Nil) => NullType
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for compatible, likefrom_csv.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which case do we need it? If we have this PR, the valid string for null type should be void. Before this PR, it wasn't supported before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot it, we not support parse null before.

@SparkQA
Copy link

SparkQA commented Jul 27, 2020

Test build #126643 has finished for PR 29244 at commit 9e56ff3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 27, 2020

Test build #126645 has finished for PR 29244 at commit 8a5b2e9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 27, 2020

Test build #126642 has finished for PR 29244 at commit 9deabe4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 28, 2020

Test build #126676 has finished for PR 29244 at commit 1c84fc8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 28, 2020

Test build #126697 has finished for PR 29244 at commit 65137da.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@LantaoJin
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jul 28, 2020

Test build #126702 has finished for PR 29244 at commit 65137da.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

cloud-fan commented Jul 28, 2020

Another safer fix is: In HiveClientImpl.createTable, replace NullType with a new HiveVoidType whose simpleString returns void, before generating the type string and create Hive FieldSchema.

The question is: do we think void is a better type name than null when doing EXPLAIN?

cc @dongjoon-hyun @maropu @viirya

@dongjoon-hyun
Copy link
Member

For me, null is better inside Apache Spark EXPLAIN because I have been requesting to avoid to have a string literal void inside Apache Spark code base. However, this PR has a valid use case. After this PR, I'm okay for both cases because we cannot eliminate void completely.

@cloud-fan
Copy link
Contributor

How about we be conservative and follow #29244 (comment) first? @ulysses-you

@ulysses-you
Copy link
Contributor Author

@cloud-fan you mean something like this pr #28935 ?

As I know, we can just update HiveClientImpl.toHiveColumn and HiveClientImpl.fromHiveColumn to replace NullType to HiveVoidType if we choose this action. Since we already forbid NullType with table and void won't exist with view related functionality during parsing.

@cloud-fan
Copy link
Contributor

cloud-fan commented Aug 13, 2020

Yea similar but more narrowed to HiveClientImpl. We can even define the class HiveVoidType in HiveClientImpl.

@ulysses-you
Copy link
Contributor Author

Created #29423.

cloud-fan pushed a commit that referenced this pull request Aug 14, 2020
### What changes were proposed in this pull request?

Discussion with [comment](#29244 (comment)).

Add `HiveVoidType` class in `HiveClientImpl` then we can replace `NullType` to `HiveVoidType` before we call hive client.

### Why are the changes needed?

Better compatible with hive.

More details in [#29244](#29244).

### Does this PR introduce _any_ user-facing change?

Yes, user can create view with null type in Hive.

### How was this patch tested?

New test.

Closes #29423 from ulysses-you/add-HiveVoidType.

Authored-by: ulysses <youxiduo@weidian.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@ulysses-you
Copy link
Contributor Author

close this since #29423 merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants