[SPARK-16735][SQL] `map` should create a decimal key or value from decimals with different precisions and scales #14374

biglobster · 2016-07-26T22:28:03Z

What changes were proposed in this pull request?

In Spark 2.0, we will parse float literals as decimals. However, it introduces a side-effect, which is described below.

Before
spark-sql> select map(0.1,0.01, 0.2,0.033);
Error in query: cannot resolve 'map(CAST(0.1 AS DECIMAL(1,1)), CAST(0.01 AS DECIMAL(2,2)), CAST(0.2 AS DECIMAL(1,1)), CAST(0.033 AS DECIMAL(3,3)))' due to data type mismatch: The given values of function map should all be the same type, but they are [decimal(2,2), decimal(3,3)]; line 1 pos 7

After

spark-sql> select map(0.1,0.01, 0.2,0.033);
{0.1:0.010,0.2:0.033}
Time taken: 2.448 seconds, Fetched 1 row(s)

How was this patch tested?

Pass the run-tests with a new test case.

…ing different inferred precessions and scales JIRA_ID:SPARK-16735 Description: In Spark 2.0, we will parse float literals as decimals. However, it introduces a side-effect, which is described below. spark-sql> select map(0.1,0.01, 0.2,0.033); Error in query: cannot resolve 'map(CAST(0.1 AS DECIMAL(1,1)), CAST(0.01 AS DECIMAL(2,2)), CAST(0.2 AS DECIMAL(1,1)), CAST(0.033 AS DECIMAL(3,3)))' due to data type mismatch: The given values of function map should all be the same type, but they are [decimal(2,2), decimal(3,3)]; line 1 pos 7 Test: spark-sql> select map(0.1,0.01, 0.2,0.033); {0.1:0.010,0.2:0.033} Time taken: 2.448 seconds, Fetched 1 row(s)

…ng different inferred precessions and scales JIRA_ID:SPARK-16735 Description: In Spark 2.0, we will parse float literals as decimals. However, it introduces a side-effect, which is described below. spark-sql> select map(0.1,0.01, 0.2,0.033); Error in query: cannot resolve 'map(CAST(0.1 AS DECIMAL(1,1)), CAST(0.01 AS DECIMAL(2,2)), CAST(0.2 AS DECIMAL(1,1)), CAST(0.033 AS DECIMAL(3,3)))' due to data type mismatch: The given values of function map should all be the same type, but they are [decimal(2,2), decimal(3,3)]; line 1 pos 7 Test: spark-sql> select map(0.1,0.01, 0.2,0.033); {0.1:0.010,0.2:0.033} Time taken: 2.448 seconds, Fetched 1 row(s)

…ing different inferred precessions and scales JIRA_ID: SPARK-16735 Description: In Spark 2.0, we will parse float literals as decimals. However, it introduces a side-effect, which is described below. spark-sql> select map(0.1,0.01, 0.2,0.033); Error in query: cannot resolve 'map(CAST(0.1 AS DECIMAL(1,1)), CAST(0.01 AS DECIMAL(2,2)), CAST(0.2 AS DECIMAL(1,1)), CAST(0.033 AS DECIMAL(3,3)))' due to data type mismatch: The given values of function map should all be the same type, but they are [decimal(2,2), decimal(3,3)]; line 1 pos 7 Test:spark-sql> select map(0.1,0.01, 0.2,0.033); {0.1:0.010,0.2:0.033} Time taken: 2.448 seconds, Fetched 1 row(s)

AmplabJenkins · 2016-07-26T22:32:17Z

Can one of the admins verify this patch?

dongjoon-hyun · 2016-07-26T22:40:32Z

Hi, @biglobster .
You had better make a Jira for this and update the title of this PR accordingly.

JIRA_ID:no Description:fix jira_id in the test("SPARK-16735: CreateMap with Decimals") Test:no

biglobster · 2016-07-26T23:59:30Z

@dongjoon-hyun thank you, and I have just update the title of this pull request with the jira_id

SPARK-16735

rxin · 2016-07-27T06:54:31Z

Do we have problem with other functions, e.g. array, struct, coalesce?

HyukjinKwon · 2016-07-27T07:06:12Z

@rxin, Please let me leave a comment because I noticed this problem before.

For array, yes.
For coalesce, it is being handled in TypeCoercion.
For struct, it seems fine with different types.

IMHO, we might have to consider other compatible numeric types (and widening precision and scale) if we should treat the decimals with different precision and scale as the same types (but it might have to not lose the value or precision, e.g. from decimal to double)

EDITED: FYI, for least and greatest, I opened this #14294; however, we are discussing the right behaviour in SPARK-16646.

Other than them, it seems there is no case similar with this.

biglobster · 2016-07-27T07:07:18Z

@rxin I find a related jira[SPARK-16714] that fixed the problem of the array function. so make my report as a sub task of the jira[SPARK-16714]
the struct have not been tested

dongjoon-hyun · 2016-07-27T12:06:29Z

For the array issue of SPARK-16714, the PR was ready for review at #14353 .

rxin · 2016-07-27T15:44:10Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala

+      case _ if elementType.isInstanceOf[DecimalType] =>
+        var tighter: DataType = elementType
+        colType.foreach { child =>
+          if (elementType.asInstanceOf[DecimalType].isTighterThan(child.dataType)) {


isTighterThan is not associative - i think this would be a problem?

@rxin I have checked this function, and it will not lost any precision or range ,it 's safe .
and in the checkDecimalType, we just check the datatype and do not change datatype.
(when keys or values contains integer type , it will pass. but still integer type)
so checkInputDataTypes will return result like it done before.
and InCase when keys or values contains integer type, I will use a new function instead of isTighterThan that do not check integer type.
can you give me some advise ? thank you :)

What I was referring to was that isTighterThan was not associative, and i don't think you can just take the tightest one this way.

As an example:

a precision 10, scale 5
b precision 7, scale 1

in this case a is not tighter than b, but b would be chosen as the target data type, leading to lose of precision.

thx：） get it

JIRA_ID:SPARK-16735 Description: I have checked this function, and it will not lost any precision or range ,it 's safe . and in the checkDecimalType, we just check the datatype and do not change datatype. (when keys or values contains integer type , it will pass. but still integer type) so checkInputDataTypes will return result like it done before. and InCase when keys or values contains integer type, I will use a new function instead of isTighterThan that do not check integer type. Test:done

petermaxlee · 2016-07-28T07:38:27Z

@biglobster @dongjoon-hyun I created a patch here: #14389

…ide.md JIRA_ID:SPARK-16870 Description:efault value for spark.sql.broadcastTimeout is 300s. and this property do not show in any docs of spark. so add "spark.sql.broadcastTimeout" into docs/sql-programming-guide.md to help people to how to fix this timeout error when it happenned Test:done

biglobster added 3 commits July 26, 2016 19:32

biglobster changed the title ~~Keliang~~ [SPARK-16715][SQL] map should create a decimal key or value from decimals with different precisions and scales Jul 26, 2016

biglobster changed the title ~~[SPARK-16715][SQL] map should create a decimal key or value from decimals with different precisions and scales~~ [SPARK-16735][SQL] map should create a decimal key or value from decimals with different precisions and scalesSPARK-16735 Jul 26, 2016

biglobster changed the title ~~[SPARK-16735][SQL] map should create a decimal key or value from decimals with different precisions and scalesSPARK-16735~~ [SPARK-16735][SQL] map should create a decimal key or value from decimals with different precisions and scales Jul 26, 2016

Summary:fix jira_id in the test("SPARK-16735: CreateMap with Decimals")

7143ed6

JIRA_ID:no Description:fix jira_id in the test("SPARK-16735: CreateMap with Decimals") Test:no

rxin reviewed Jul 27, 2016
View reviewed changes

petermaxlee mentioned this pull request Jul 28, 2016

[SPARK-16714][SQL] Refactor type widening for consistency #14389

Closed

biglobster closed this Aug 3, 2016

biglobster deleted the keliang branch August 3, 2016 08:38

[SPARK-16735][SQL] map should create a decimal key or value from decimals with different precisions and scales #14374

[SPARK-16735][SQL] map should create a decimal key or value from decimals with different precisions and scales #14374

Uh oh!

Conversation

biglobster commented Jul 26, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

AmplabJenkins commented Jul 26, 2016

Uh oh!

dongjoon-hyun commented Jul 26, 2016

Uh oh!

biglobster commented Jul 26, 2016

Uh oh!

rxin commented Jul 27, 2016

Uh oh!

HyukjinKwon commented Jul 27, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

biglobster commented Jul 27, 2016

Uh oh!

dongjoon-hyun commented Jul 27, 2016

Uh oh!

rxin Jul 27, 2016

Choose a reason for hiding this comment

Uh oh!

biglobster Jul 27, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rxin Jul 28, 2016

Choose a reason for hiding this comment

Uh oh!

biglobster Jul 28, 2016

Choose a reason for hiding this comment

Uh oh!

petermaxlee commented Jul 28, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[SPARK-16735][SQL] `map` should create a decimal key or value from decimals with different precisions and scales #14374

[SPARK-16735][SQL] `map` should create a decimal key or value from decimals with different precisions and scales #14374

HyukjinKwon commented Jul 27, 2016 •

edited

Loading

biglobster Jul 27, 2016 •

edited

Loading