Skip to content

Conversation

@biglobster
Copy link
Contributor

What changes were proposed in this pull request?

In Spark 2.0, we will parse float literals as decimals. However, it introduces a side-effect, which is described below.

Before
spark-sql> select map(0.1,0.01, 0.2,0.033);
Error in query: cannot resolve 'map(CAST(0.1 AS DECIMAL(1,1)), CAST(0.01 AS DECIMAL(2,2)), CAST(0.2 AS DECIMAL(1,1)), CAST(0.033 AS DECIMAL(3,3)))' due to data type mismatch: The given values of function map should all be the same type, but they are [decimal(2,2), decimal(3,3)]; line 1 pos 7

After

spark-sql> select map(0.1,0.01, 0.2,0.033);
{0.1:0.010,0.2:0.033}
Time taken: 2.448 seconds, Fetched 1 row(s)

How was this patch tested?

Pass the run-tests with a new test case.

…ing different inferred precessions and scales

JIRA_ID:SPARK-16735
Description:
In Spark 2.0, we will parse float literals as decimals. However, it introduces a side-effect, which is described below.
spark-sql> select map(0.1,0.01, 0.2,0.033);
Error in query: cannot resolve 'map(CAST(0.1 AS DECIMAL(1,1)), CAST(0.01 AS DECIMAL(2,2)), CAST(0.2 AS DECIMAL(1,1)), CAST(0.033 AS DECIMAL(3,3)))' due to data type mismatch: The given values of function map should all be the same type, but they are [decimal(2,2), decimal(3,3)]; line 1 pos 7
Test:
spark-sql> select map(0.1,0.01, 0.2,0.033);
{0.1:0.010,0.2:0.033}
Time taken: 2.448 seconds, Fetched 1 row(s)
…ng different inferred precessions and scales

JIRA_ID:SPARK-16735
Description:
In Spark 2.0, we will parse float literals as decimals. However, it introduces a side-effect, which is described below.
spark-sql> select map(0.1,0.01, 0.2,0.033);
Error in query: cannot resolve 'map(CAST(0.1 AS DECIMAL(1,1)), CAST(0.01 AS DECIMAL(2,2)), CAST(0.2 AS DECIMAL(1,1)), CAST(0.033 AS DECIMAL(3,3)))' due to data type mismatch: The given values of function map should all be the same type, but they are [decimal(2,2), decimal(3,3)]; line 1 pos 7
Test:
spark-sql> select map(0.1,0.01, 0.2,0.033);
{0.1:0.010,0.2:0.033}
Time taken: 2.448 seconds, Fetched 1 row(s)
…ing different inferred precessions and scales

JIRA_ID: SPARK-16735
Description: In Spark 2.0, we will parse float literals as decimals. However, it introduces a side-effect, which is described below.
spark-sql> select map(0.1,0.01, 0.2,0.033);
Error in query: cannot resolve 'map(CAST(0.1 AS DECIMAL(1,1)), CAST(0.01 AS DECIMAL(2,2)), CAST(0.2 AS DECIMAL(1,1)), CAST(0.033 AS DECIMAL(3,3)))' due to data type mismatch: The given values of function map should all be the same type, but they are [decimal(2,2), decimal(3,3)]; line 1 pos 7
Test:spark-sql> select map(0.1,0.01, 0.2,0.033);
{0.1:0.010,0.2:0.033}
Time taken: 2.448 seconds, Fetched 1 row(s)
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@dongjoon-hyun
Copy link
Member

Hi, @biglobster .
You had better make a Jira for this and update the title of this PR accordingly.

@biglobster biglobster changed the title Keliang [SPARK-16715][SQL] map should create a decimal key or value from decimals with different precisions and scales Jul 26, 2016
@biglobster biglobster changed the title [SPARK-16715][SQL] map should create a decimal key or value from decimals with different precisions and scales [SPARK-16735][SQL] map should create a decimal key or value from decimals with different precisions and scalesSPARK-16735 Jul 26, 2016
@biglobster biglobster changed the title [SPARK-16735][SQL] map should create a decimal key or value from decimals with different precisions and scalesSPARK-16735 [SPARK-16735][SQL] map should create a decimal key or value from decimals with different precisions and scales Jul 26, 2016
JIRA_ID:no
Description:fix jira_id in the test("SPARK-16735: CreateMap with Decimals")
Test:no
@biglobster
Copy link
Contributor Author

@dongjoon-hyun thank you, and I have just update the title of this pull request with the jira_id

SPARK-16735

@rxin
Copy link
Contributor

rxin commented Jul 27, 2016

Do we have problem with other functions, e.g. array, struct, coalesce?

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Jul 27, 2016

@rxin, Please let me leave a comment because I noticed this problem before.

For array, yes.
For coalesce, it is being handled in TypeCoercion.
For struct, it seems fine with different types.

IMHO, we might have to consider other compatible numeric types (and widening precision and scale) if we should treat the decimals with different precision and scale as the same types (but it might have to not lose the value or precision, e.g. from decimal to double)

EDITED: FYI, for least and greatest, I opened this #14294; however, we are discussing the right behaviour in SPARK-16646.

Other than them, it seems there is no case similar with this.

@biglobster
Copy link
Contributor Author

@rxin I find a related jira[SPARK-16714] that fixed the problem of the array function. so make my report as a sub task of the jira[SPARK-16714]
the struct have not been tested

@dongjoon-hyun
Copy link
Member

For the array issue of SPARK-16714, the PR was ready for review at #14353 .

case _ if elementType.isInstanceOf[DecimalType] =>
var tighter: DataType = elementType
colType.foreach { child =>
if (elementType.asInstanceOf[DecimalType].isTighterThan(child.dataType)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isTighterThan is not associative - i think this would be a problem?

Copy link
Contributor Author

@biglobster biglobster Jul 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rxin I have checked this function, and it will not lost any precision or range ,it 's safe .
and in the checkDecimalType, we just check the datatype and do not change datatype.
(when keys or values contains integer type , it will pass. but still integer type)
so checkInputDataTypes will return result like it done before.
and InCase when keys or values contains integer type, I will use a new function instead of isTighterThan that do not check integer type.
can you give me some advise ? thank you :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I was referring to was that isTighterThan was not associative, and i don't think you can just take the tightest one this way.

As an example:

a precision 10, scale 5
b precision 7, scale 1

in this case a is not tighter than b, but b would be chosen as the target data type, leading to lose of precision.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx:) get it

JIRA_ID:SPARK-16735
Description:
I have checked this function, and it will not lost any precision or range ,it 's safe .
and in the checkDecimalType, we just check the datatype and do not change datatype.
(when keys or values contains integer type , it will pass. but still integer type)
so checkInputDataTypes will return result like it done before.
and InCase when keys or values contains integer type, I will use a new function instead of isTighterThan that do not check integer type.
Test:done
@petermaxlee
Copy link
Contributor

@biglobster @dongjoon-hyun I created a patch here: #14389

…ide.md

JIRA_ID:SPARK-16870
Description:efault value for spark.sql.broadcastTimeout is 300s. and this property do not show in any docs of spark. so add "spark.sql.broadcastTimeout" into docs/sql-programming-guide.md to help people to how to fix this timeout error when it happenned
Test:done
@biglobster biglobster closed this Aug 3, 2016
@biglobster biglobster deleted the keliang branch August 3, 2016 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants