-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-28313][SQL] Spark sql null type incompatible with hive void type #25085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
merge master
| val expectedMsg = "DataType void is not supported" | ||
| withTable("t") { | ||
| val e = intercept[AnalysisException] { | ||
| sql("create table t (a void)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is this related to hive compatibility? It seems to me that you just change this statement from parser exception to analysis exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is just to confirm spark cannot do this ddl that compatible with hive.
| } | ||
|
|
||
| withTable("t") { | ||
| sql("CREATE TABLE t AS SELECT NULL AS col ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the behavior before your patch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throw a SparkException 'Cannot recognize hive type string: NULL'
|
ok to test |
|
Test build #107837 has finished for PR 25085 at commit
|
|
cc @HyukjinKwon ok to test, please add an another test. |
|
Test build #107842 has finished for PR 25085 at commit
|
|
ok to test |
|
Test build #107843 has finished for PR 25085 at commit
|
|
ok to test |
|
Test build #107852 has finished for PR 25085 at commit
|
| /** | ||
| * SPARK-28313: Spark sql null type incompatible with hive void type | ||
| */ | ||
| object CreateTableCheck extends Rule[LogicalPlan] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make a new PR for this change? This is a behavior change, which needs more discussion and needs to add something to migration guide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. As i say, this pr has two goals.
- compatibility with hive
- reject create table use NullType
So it's ok to make new pr for goal 2.
|
Test build #107872 has finished for PR 25085 at commit
|
|
Test build #107885 has finished for PR 25085 at commit
|
|
retest this please |
1 similar comment
|
retest this please |
|
Test build #108194 has finished for PR 25085 at commit
|
|
Retest this please. |
|
Test build #108672 has finished for PR 25085 at commit
|
|
ok to test |
|
Test build #110676 has finished for PR 25085 at commit
|
|
|
||
| private[spark] override def asNullable: NullType = this | ||
|
|
||
| override def simpleString: String = "void" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also an unrelated change. I like the new naming, but it's unnecessary to the hive compatibility issue.
| CatalystSqlParser.parseDataType(hc.getType) | ||
| hc.getType match { | ||
| // SPARK-28313 compatible hive void type | ||
| case "void" => NullType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is the fix we need. But I'm curious about when can this happen if hive forbids defining void type columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix this point, hive void type will be converted to null type.
Such as, t1 have a void type column c1, when execute show create table $tbl.
Spark:
create table t1 (c1 null)
Hive:
create table t1 (c1 void)
And without fix this point, spark will throw an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I'm curious about when can this happen if hive forbids defining void type columns.
Can hive do create table t1 (c1 void)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, hive can not but hive can do create table t1 as select null as c1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make this fix surgical and only keep this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about add code to check and change null to void at ShowCreateTableCommand.showCreateHiveTable() ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it have to be in this PR? If it has to, I'm ok with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah~ If you have a time, you could review the latest commit.
e71c4d2 to
b4cc595
Compare
|
Test build #111382 has finished for PR 25085 at commit
|
|
Test build #111391 has finished for PR 25085 at commit
|
|
Test build #111587 has finished for PR 25085 at commit
|
|
Test build #111588 has finished for PR 25085 at commit
|
|
@cloud-fan Do you have any time to take a look ? |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
SPARK-20680, PR, but actually this Jira was not solved.
Spark is incompatible with hive void type. When table schema contains void type, spark throw exception in ddl option, like desc, show create table..
Also, spark catalog.createTable can create NullType Table that is not allowed.
Goal:
desc,show create tableoptionHow was this patch tested?
UT