Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-24012][SQL] Union of map and other compatible column #21100

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,14 @@ object TypeCoercion {
StructField(f1.name, dataType, nullable = f1.nullable || f2.nullable)
}))

case (a1 @ ArrayType(et1, hasNull1), a2 @ ArrayType(et2, hasNull2)) if a1.sameType(a2) =>
findTightestCommonType(et1, et2).map(ArrayType(_, hasNull1 || hasNull2))

case (m1 @ MapType(kt1, vt1, hasNull1), m2 @ MapType(kt2, vt2, hasNull2)) if m1.sameType(m2) =>
val keyType = findTightestCommonType(kt1, kt2)
val valueType = findTightestCommonType(vt1, vt2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I think we should do the same thing in findWiderTypeForTwo to cover some corner cases such as decimal or string promotion within keys and values .. ? and seems #21100 (comment) suggested the same thing ..?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something we should figure out: why findWiderTypeForTwo only take care of array type? seems all complex types should be handled there, especially if it follows Hive's behavior.

Anyway it's orthogonal to findTightestCommonType, they are used for different operators.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I was just wondering while reading it. However, doesn't that mean we don't do type widening for nested types in the same way? I was thinking we should do the same type widening for nested types too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, I was thinking we should do that in both places in findTightestCommonType and findWiderTypeForTwo. Otherwise, the nested types in struct, map or array won't do, for example, decimal or string promotion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it's orthogonal. Yup, I was just wondering. I am okay to leave this out of this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooops, I misread your comment. Sorry. I was talking about the same thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to look into findTightestCommonType. Currently we are just very conservative and only allow nullability change for complex types in findTightestCommonType. We need to take a look at other systems and see what they do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with that given past discussions - I didn't mean we should change something now .. but was just wondering.

Some(MapType(keyType.get, valueType.get, hasNull1 || hasNull2))

case _ => None
}

Expand Down
11 changes: 11 additions & 0 deletions sql/core/src/test/resources/sql-tests/inputs/union.sql
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,17 @@ FROM (SELECT col AS col
SELECT col
FROM p3) T1) T2;

-- SPARK-24012 Union of map and other compatible columns.
SELECT map(1, 2), 'str'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we also add a test for array?

UNION ALL
SELECT map(1, 2, 3, NULL), 1;

-- SPARK-24012 Union of array and other compatible columns.
SELECT array(1, 2), 'str'
UNION ALL
SELECT array(1, 2, 3, NULL), 1;


-- Clean-up
DROP VIEW IF EXISTS t1;
DROP VIEW IF EXISTS t2;
Expand Down
42 changes: 32 additions & 10 deletions sql/core/src/test/resources/sql-tests/results/union.sql.out
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
-- Automatically generated by SQLQueryTestSuite
-- Number of queries: 14
-- Number of queries: 16


-- !query 0
Expand Down Expand Up @@ -105,40 +105,62 @@ struct<x:int,col:int>


-- !query 9
DROP VIEW IF EXISTS t1
SELECT map(1, 2), 'str'
UNION ALL
SELECT map(1, 2, 3, NULL), 1
-- !query 9 schema
struct<>
struct<map(1, 2):map<int,int>,str:string>
-- !query 9 output

{1:2,3:null} 1
{1:2} str


-- !query 10
DROP VIEW IF EXISTS t2
SELECT array(1, 2), 'str'
UNION ALL
SELECT array(1, 2, 3, NULL), 1
-- !query 10 schema
struct<>
struct<array(1, 2):array<int>,str:string>
-- !query 10 output

[1,2,3,null] 1
[1,2] str


-- !query 11
DROP VIEW IF EXISTS p1
DROP VIEW IF EXISTS t1
-- !query 11 schema
struct<>
-- !query 11 output



-- !query 12
DROP VIEW IF EXISTS p2
DROP VIEW IF EXISTS t2
-- !query 12 schema
struct<>
-- !query 12 output



-- !query 13
DROP VIEW IF EXISTS p3
DROP VIEW IF EXISTS p1
-- !query 13 schema
struct<>
-- !query 13 output



-- !query 14
DROP VIEW IF EXISTS p2
-- !query 14 schema
struct<>
-- !query 14 output



-- !query 15
DROP VIEW IF EXISTS p3
-- !query 15 schema
struct<>
-- !query 15 output