-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-20680][SQL] Adding HiveVoidType in Spark to be compatible with Hive #28935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also fix
spark/python/pyspark/sql/tests/test_types.py
Line 486 in 68d7edf
| if t != NullType: |
null type at spark/python/pyspark/sql/types.py
Line 752 in d21aab4
| _all_atomic_types = dict((t.typeName(), t) for t in _atomic_types) |
I am okay if you're not used to Python side - I can do it in a followup.
|
Thanks @HyukjinKwon, if this could be merged, can you help on python side? |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
| * and should NOT be used anywhere else. Any instance of these data types should be | ||
| * replaced by a [[NullType]] before analysis. | ||
| */ | ||
| class HiveNullType private() extends DataType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the context, but can we name this HiveVoidType literally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, the description is interpreted like "hive null type should be replaced by a NullType before analysis".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
null is a value and Hive exposes void as a type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nullis avalueand Hive exposesvoidas a type.
You are right.
sql/catalyst/src/main/scala/org/apache/spark/sql/types/HiveNullType.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/types/HiveNullType.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/types/HiveNullType.scala
Outdated
Show resolved
Hide resolved
|
Thank you for working on this, @LantaoJin ! |
|
Test build #124583 has finished for PR 28935 at commit
|
|
Test build #124584 has finished for PR 28935 at commit
|
|
Test build #124590 has finished for PR 28935 at commit
|
|
retest this please |
|
Test build #124597 has finished for PR 28935 at commit
|
|
Thank you for updating, @LantaoJin . |
|
@LantaoJin . Is there a reason why you use
|
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
Outdated
Show resolved
Hide resolved
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
Outdated
Show resolved
Hide resolved
No, just typo. Fixed. |
sql/catalyst/src/main/scala/org/apache/spark/sql/types/HiveStringType.scala
Outdated
Show resolved
Hide resolved
| case ("decimal" | "dec" | "numeric", precision :: scale :: Nil) => | ||
| DecimalType(precision.getText.toInt, scale.getText.toInt) | ||
| case ("interval", Nil) => CalendarIntervalType | ||
| case ("void", Nil) => HiveVoidType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just reuse NullType. We should forbid creating table with null type completely, including spark.catalog.createTable.
|
Test build #124614 has finished for PR 28935 at commit
|
|
Test build #124639 has started for PR 28935 at commit |
|
I think we need two changes:
These two changes can be done with 2 PRs. |
|
Yes. This PR is for the second change. |
|
we need to do 1 first, otherwise it makes users be able to create tables with void type via CREATE TABLE command, while it was not possible before as the parser doesn't support it. |
|
Ah, now I understood the context. The 5th commit should be reverted in this PR, otherwise the UT will fail. And we need to do 1 first. I will work on that tomorrow. My laptop is out of power now. |
|
@cloud-fan I refactor some codes, now I think this PR could be no dependency. |
| } | ||
| // Add Hive type string to metadata. | ||
| val cleanedDataType = HiveStringType.replaceCharType(dataType) | ||
| // Add Hive type 'string' and 'void' to metadata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can be more aggressive here: forbid void type in all cases, including hive tables.
| */ | ||
| private def visitSparkDataType(ctx: DataTypeContext): DataType = { | ||
| HiveStringType.replaceCharType(typedVisit(ctx)) | ||
| HiveVoidType.replaceVoidType(HiveStringType.replaceCharType(typedVisit(ctx))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get it why we need HiveVoidType. What happens if we just parse void to NullType?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because that could indicate VOID is a Hive type, the handle processing is more unified. Or, we can just use the PR #28833
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, below function will point the failure is due to the legacy hive void type. If we mix VOID and NULL, I am not sure it would be better than separation.
def failVoidType(dt: DataType): Unit = {
if (HiveVoidType.containsVoidType(dt)) {
throw new AnalysisException(
"Cannot create tables with Hive VOID type.")
}
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VOID and NULL are indeed the same type. We can just check null type and fail with error message: Cannot create tables with VOID type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is consistency: The VOID type in SQL statement should be the same as NullType specified by Scala API in spark.catalog.createTable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan , ok. I will follow your suggestion to fix it in #28833 , since this PR is a refator with new type HiveVoidType. Now we don't need it.
|
Test build #124657 has finished for PR 28935 at commit
|
|
Close this since #28833 merged. Thank you! |
What changes were proposed in this pull request?
This is to address the close one #17953, as a refactor for #28833
Adding
HiveVoidType, likeHiveStringType, to prevent from exception when describe tables/views which the schema contain Hive VOID/NULL type.Why are the changes needed?
Spark is incompatible with hive void type. When Hive table schema contains void type, DESC table will throw an exception in Spark.
In Spark2.0.x, the behaviour to read this view is normal:
But in lastest Spark version, it failed with SparkException: Cannot recognize hive type string: void
Does this PR introduce any user-facing change?
No
How was this patch tested?
Add unit tests
Also can manual tests