Skip to content

Conversation

@ulysses-you
Copy link
Contributor

@ulysses-you ulysses-you commented Jul 9, 2019

What changes were proposed in this pull request?

SPARK-20680, PR, but actually this Jira was not solved.

Spark is incompatible with hive void type. When table schema contains void type, spark throw exception in ddl option, like desc, show create table..

Also, spark catalog.createTable can create NullType Table that is not allowed.

Goal:

  1. Support hive void column type for desc,show create table option
  2. Add rule to throw exception when catalog.create NullType StructField, see 25198

How was this patch tested?

UT

@ulysses-you
Copy link
Contributor Author

cc @cloud-fan @gatorsmile

val expectedMsg = "DataType void is not supported"
withTable("t") {
val e = intercept[AnalysisException] {
sql("create table t (a void)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this related to hive compatibility? It seems to me that you just change this statement from parser exception to analysis exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is just to confirm spark cannot do this ddl that compatible with hive.

}

withTable("t") {
sql("CREATE TABLE t AS SELECT NULL AS col ")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the behavior before your patch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throw a SparkException 'Cannot recognize hive type string: NULL'

@HyukjinKwon
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Jul 18, 2019

Test build #107837 has finished for PR 25085 at commit 64c02c7.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ulysses-you
Copy link
Contributor Author

ulysses-you commented Jul 18, 2019

cc @HyukjinKwon ok to test, please add an another test.

@SparkQA
Copy link

SparkQA commented Jul 18, 2019

Test build #107842 has finished for PR 25085 at commit 997abb2.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ulysses-you
Copy link
Contributor Author

ok to test

@SparkQA
Copy link

SparkQA commented Jul 18, 2019

Test build #107843 has finished for PR 25085 at commit d4679ad.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ulysses-you
Copy link
Contributor Author

ok to test

@SparkQA
Copy link

SparkQA commented Jul 18, 2019

Test build #107852 has finished for PR 25085 at commit f3eecfb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

/**
* SPARK-28313: Spark sql null type incompatible with hive void type
*/
object CreateTableCheck extends Rule[LogicalPlan] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make a new PR for this change? This is a behavior change, which needs more discussion and needs to add something to migration guide.

Copy link
Contributor Author

@ulysses-you ulysses-you Jul 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. As i say, this pr has two goals.

  1. compatibility with hive
  2. reject create table use NullType

So it's ok to make new pr for goal 2.

@SparkQA
Copy link

SparkQA commented Jul 19, 2019

Test build #107872 has finished for PR 25085 at commit c9fc32c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 19, 2019

Test build #107885 has finished for PR 25085 at commit e71c4d2.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ulysses-you
Copy link
Contributor Author

retest this please

1 similar comment
@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jul 26, 2019

Test build #108194 has finished for PR 25085 at commit e71c4d2.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Aug 5, 2019

Test build #108672 has finished for PR 25085 at commit e71c4d2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Sep 17, 2019

Test build #110676 has finished for PR 25085 at commit e71c4d2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


private[spark] override def asNullable: NullType = this

override def simpleString: String = "void"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also an unrelated change. I like the new naming, but it's unnecessary to the hive compatibility issue.

CatalystSqlParser.parseDataType(hc.getType)
hc.getType match {
// SPARK-28313 compatible hive void type
case "void" => NullType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is the fix we need. But I'm curious about when can this happen if hive forbids defining void type columns.

Copy link
Contributor Author

@ulysses-you ulysses-you Sep 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix this point, hive void type will be converted to null type.
Such as, t1 have a void type column c1, when execute show create table $tbl.
Spark:
create table t1 (c1 null)
Hive:
create table t1 (c1 void)

And without fix this point, spark will throw an exception.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I'm curious about when can this happen if hive forbids defining void type columns.

Can hive do create table t1 (c1 void)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, hive can not but hive can do create table t1 as select null as c1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make this fix surgical and only keep this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about add code to check and change null to void at ShowCreateTableCommand.showCreateHiveTable() ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it have to be in this PR? If it has to, I'm ok with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah~ If you have a time, you could review the latest commit.

@SparkQA
Copy link

SparkQA commented Sep 26, 2019

Test build #111382 has finished for PR 25085 at commit b4cc595.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 26, 2019

Test build #111391 has finished for PR 25085 at commit 9997ac1.

  • This patch fails due to an unknown error code, -9.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 30, 2019

Test build #111587 has finished for PR 25085 at commit ec3839e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 30, 2019

Test build #111588 has finished for PR 25085 at commit aa08658.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ulysses-you
Copy link
Contributor Author

@cloud-fan Do you have any time to take a look ?

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Jan 18, 2020
@github-actions github-actions bot closed this Jan 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants