Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-24012][SQL] Union of map and other compatible column #21100

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,18 @@ object TypeCoercion {
val dataType = findTightestCommonType(f1.dataType, f2.dataType).get
StructField(f1.name, dataType, nullable = f1.nullable || f2.nullable)
}))
case (a1 @ ArrayType(et1, containsNull1), a2 @ ArrayType(et2, containsNull2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can shorten the name here: hasNull1 hasNull2

if a1.sameType(a2) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after shortening the name, can we merge the if to the case ... line?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also we need blank line between these cases

findTightestCommonType(et1, et2).map(ArrayType(_, containsNull1 || containsNull2))
case (m1 @ MapType(keyType1, valueType1, n1), m2 @ MapType(keyType2, valueType2, n2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto: kt1, vt1, hasNull1

if m1.sameType(m2) =>
val keyType = findTightestCommonType(keyType1, keyType2)
val valueType = findTightestCommonType(valueType1, valueType2)
if(keyType.isEmpty || valueType.isEmpty) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this, it's guaranteed by m1.sameType(m2)

None
} else {
Some(MapType(keyType.get, valueType.get, n1 || n2))
}

case _ => None
}
Expand Down
5 changes: 5 additions & 0 deletions sql/core/src/test/resources/sql-tests/inputs/union.sql
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,11 @@ FROM (SELECT col AS col
SELECT col
FROM p3) T1) T2;

-- SPARK-24012 Union of map and other compatible columns.
SELECT map(1, 2), 'str'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we also add a test for array?

UNION ALL
SELECT map(1, 2, 3, NULL), 1;

-- Clean-up
DROP VIEW IF EXISTS t1;
DROP VIEW IF EXISTS t2;
Expand Down
27 changes: 19 additions & 8 deletions sql/core/src/test/resources/sql-tests/results/union.sql.out
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
-- Automatically generated by SQLQueryTestSuite
-- Number of queries: 14
-- Number of queries: 15


-- !query 0
Expand Down Expand Up @@ -105,40 +105,51 @@ struct<x:int,col:int>


-- !query 9
DROP VIEW IF EXISTS t1
SELECT map(1, 2), 'str'
UNION ALL
SELECT map(1, 2, 3, NULL), 1
-- !query 9 schema
struct<>
struct<map(1, 2):map<int,int>,str:string>
-- !query 9 output

{1:2,3:null} 1
{1:2} str


-- !query 10
DROP VIEW IF EXISTS t2
DROP VIEW IF EXISTS t1
-- !query 10 schema
struct<>
-- !query 10 output



-- !query 11
DROP VIEW IF EXISTS p1
DROP VIEW IF EXISTS t2
-- !query 11 schema
struct<>
-- !query 11 output



-- !query 12
DROP VIEW IF EXISTS p2
DROP VIEW IF EXISTS p1
-- !query 12 schema
struct<>
-- !query 12 output



-- !query 13
DROP VIEW IF EXISTS p3
DROP VIEW IF EXISTS p2
-- !query 13 schema
struct<>
-- !query 13 output



-- !query 14
DROP VIEW IF EXISTS p3
-- !query 14 schema
struct<>
-- !query 14 output

13 changes: 13 additions & 0 deletions sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
Original file line number Diff line number Diff line change
Expand Up @@ -896,6 +896,19 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
}
}

test("SPARK-24012 Union of map and other compatible columns") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @gatorsmile , what's the policy for end-to-end tests? Shall we add it in both the sql golden file and SQLQuerySuite?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. please add them to SQLQueryTestSuite

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed with @gatorsmile , we should put end-to-end test in a single place, and currently we encourage people to put SQL related end-to-end test in the SQL golden files. That is to say, we should remove this test from SQLQuerySuite.

In the meanwhile, a bug fix should also have a unit test. For this case, we should add a test case in TypeCoercionSuite. @liutang123 if you are not familiar with that test suite, please let us know, we can merge your PR first and add UT in TypeCoercionSuite in a followup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan , Yes, I am not familiar with TypeCoercionSuite. In order to save time, in my opinion, this PR can be merged first. Thanks a lot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, please remove this test and it's ready to go.

checkAnswer(
sql(
"""
|SELECT map(1, 2), 'str'
|UNION ALL
|SELECT map(1, 2, 3, NULL), 1""".stripMargin),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you give some insight about why it doesn't work? I'd expect Spark first do type coercion for map(1, 2, 3, NULL), and the result is map<int, nullable int>, then Union should accept nullability difference and pass the analysis.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

map<int, nullable int> and map<int, not nullable int> are accepted by Union, but, string and int are not.

If types of one column can not be accepted by Union, TCWSOT(TypeCoercion.WidenSetOperationTypes) will try to coerce them to a completely identical type. TCWSOT works when all of the columns can be coerced and not work when columns can not be coerced exist.
map<int, nullable int> and map<int, not nullable int> can not be coerced, so, TCWSOT didn't work and string and int will not be coerced.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we make map<int, nullable int> and map<int, not nullable int> coerce-able?

Copy link
Contributor Author

@liutang123 liutang123 Apr 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course we can.
two solution:

  1. Try cast two map types to one no matter key types are not the same or value types are not the same.
    select map(1, 2) union all map(1, 'str') will work.
  2. Cast two map types to one only when the key types are the same and value types are the same. This solution just resolve the problem that map<t1, nullable t2> and map<t1, not nullable t2> can't be union.

Hive doesn't support select map(1, 2) union all map(1, 'str'), should spark be compatible with hive?

Row.fromSeq(Seq(Map(1 -> 2), "str"))::
Row.fromSeq(Seq(Map(1 -> 2, 3 -> null), "1"))::
Nil
)
}

test("EXCEPT") {
checkAnswer(
sql("SELECT * FROM lowerCaseData EXCEPT SELECT * FROM upperCaseData"),
Expand Down