-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-24012][SQL] Union of map and other compatible column #21100
Changes from 3 commits
a422a7f
cb883d9
19b5c6a
0845739
670824f
8cb240f
4b1ce36
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -111,6 +111,18 @@ object TypeCoercion { | |
val dataType = findTightestCommonType(f1.dataType, f2.dataType).get | ||
StructField(f1.name, dataType, nullable = f1.nullable || f2.nullable) | ||
})) | ||
case (a1 @ ArrayType(et1, containsNull1), a2 @ ArrayType(et2, containsNull2)) | ||
if a1.sameType(a2) => | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. after shortening the name, can we merge the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also we need blank line between these cases |
||
findTightestCommonType(et1, et2).map(ArrayType(_, containsNull1 || containsNull2)) | ||
case (m1 @ MapType(keyType1, valueType1, n1), m2 @ MapType(keyType2, valueType2, n2)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ditto: |
||
if m1.sameType(m2) => | ||
val keyType = findTightestCommonType(keyType1, keyType2) | ||
val valueType = findTightestCommonType(valueType1, valueType2) | ||
if(keyType.isEmpty || valueType.isEmpty) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't need this, it's guaranteed by |
||
None | ||
} else { | ||
Some(MapType(keyType.get, valueType.get, n1 || n2)) | ||
} | ||
|
||
case _ => None | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -35,6 +35,11 @@ FROM (SELECT col AS col | |
SELECT col | ||
FROM p3) T1) T2; | ||
|
||
-- SPARK-24012 Union of map and other compatible columns. | ||
SELECT map(1, 2), 'str' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. shall we also add a test for array? |
||
UNION ALL | ||
SELECT map(1, 2, 3, NULL), 1; | ||
|
||
-- Clean-up | ||
DROP VIEW IF EXISTS t1; | ||
DROP VIEW IF EXISTS t2; | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -896,6 +896,19 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { | |
} | ||
} | ||
|
||
test("SPARK-24012 Union of map and other compatible columns") { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. cc @gatorsmile , what's the policy for end-to-end tests? Shall we add it in both the sql golden file and There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes. please add them to SQLQueryTestSuite There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. discussed with @gatorsmile , we should put end-to-end test in a single place, and currently we encourage people to put SQL related end-to-end test in the SQL golden files. That is to say, we should remove this test from In the meanwhile, a bug fix should also have a unit test. For this case, we should add a test case in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @cloud-fan , Yes, I am not familiar with TypeCoercionSuite. In order to save time, in my opinion, this PR can be merged first. Thanks a lot. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, please remove this test and it's ready to go. |
||
checkAnswer( | ||
sql( | ||
""" | ||
|SELECT map(1, 2), 'str' | ||
|UNION ALL | ||
|SELECT map(1, 2, 3, NULL), 1""".stripMargin), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you give some insight about why it doesn't work? I'd expect Spark first do type coercion for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. map<int, nullable int> and map<int, not nullable int> are accepted by Union, but, string and int are not. If types of one column can not be accepted by Union, TCWSOT(TypeCoercion.WidenSetOperationTypes) will try to coerce them to a completely identical type. TCWSOT works when all of the columns can be coerced and not work when columns can not be coerced exist. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shall we make map<int, nullable int> and map<int, not nullable int> coerce-able? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Of course we can.
Hive doesn't support |
||
Row.fromSeq(Seq(Map(1 -> 2), "str")):: | ||
Row.fromSeq(Seq(Map(1 -> 2, 3 -> null), "1")):: | ||
Nil | ||
) | ||
} | ||
|
||
test("EXCEPT") { | ||
checkAnswer( | ||
sql("SELECT * FROM lowerCaseData EXCEPT SELECT * FROM upperCaseData"), | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can shorten the name here:
hasNull1
hasNull2