-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-16735][SQL] map should create a decimal key or value from decimals with different precisions and scales
#14374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ing different inferred precessions and scales
JIRA_ID:SPARK-16735
Description:
In Spark 2.0, we will parse float literals as decimals. However, it introduces a side-effect, which is described below.
spark-sql> select map(0.1,0.01, 0.2,0.033);
Error in query: cannot resolve 'map(CAST(0.1 AS DECIMAL(1,1)), CAST(0.01 AS DECIMAL(2,2)), CAST(0.2 AS DECIMAL(1,1)), CAST(0.033 AS DECIMAL(3,3)))' due to data type mismatch: The given values of function map should all be the same type, but they are [decimal(2,2), decimal(3,3)]; line 1 pos 7
Test:
spark-sql> select map(0.1,0.01, 0.2,0.033);
{0.1:0.010,0.2:0.033}
Time taken: 2.448 seconds, Fetched 1 row(s)
…ng different inferred precessions and scales
JIRA_ID:SPARK-16735
Description:
In Spark 2.0, we will parse float literals as decimals. However, it introduces a side-effect, which is described below.
spark-sql> select map(0.1,0.01, 0.2,0.033);
Error in query: cannot resolve 'map(CAST(0.1 AS DECIMAL(1,1)), CAST(0.01 AS DECIMAL(2,2)), CAST(0.2 AS DECIMAL(1,1)), CAST(0.033 AS DECIMAL(3,3)))' due to data type mismatch: The given values of function map should all be the same type, but they are [decimal(2,2), decimal(3,3)]; line 1 pos 7
Test:
spark-sql> select map(0.1,0.01, 0.2,0.033);
{0.1:0.010,0.2:0.033}
Time taken: 2.448 seconds, Fetched 1 row(s)
…ing different inferred precessions and scales
JIRA_ID: SPARK-16735
Description: In Spark 2.0, we will parse float literals as decimals. However, it introduces a side-effect, which is described below.
spark-sql> select map(0.1,0.01, 0.2,0.033);
Error in query: cannot resolve 'map(CAST(0.1 AS DECIMAL(1,1)), CAST(0.01 AS DECIMAL(2,2)), CAST(0.2 AS DECIMAL(1,1)), CAST(0.033 AS DECIMAL(3,3)))' due to data type mismatch: The given values of function map should all be the same type, but they are [decimal(2,2), decimal(3,3)]; line 1 pos 7
Test:spark-sql> select map(0.1,0.01, 0.2,0.033);
{0.1:0.010,0.2:0.033}
Time taken: 2.448 seconds, Fetched 1 row(s)
|
Can one of the admins verify this patch? |
|
Hi, @biglobster . |
map should create a decimal key or value from decimals with different precisions and scales
map should create a decimal key or value from decimals with different precisions and scalesmap should create a decimal key or value from decimals with different precisions and scalesSPARK-16735
map should create a decimal key or value from decimals with different precisions and scalesSPARK-16735map should create a decimal key or value from decimals with different precisions and scales
JIRA_ID:no
Description:fix jira_id in the test("SPARK-16735: CreateMap with Decimals")
Test:no
|
@dongjoon-hyun thank you, and I have just update the title of this pull request with the jira_id
|
|
Do we have problem with other functions, e.g. array, struct, coalesce? |
|
@rxin, Please let me leave a comment because I noticed this problem before. For array, yes. IMHO, we might have to consider other compatible numeric types (and widening precision and scale) if we should treat the decimals with different precision and scale as the same types (but it might have to not lose the value or precision, e.g. from decimal to double) EDITED: FYI, for least and greatest, I opened this #14294; however, we are discussing the right behaviour in SPARK-16646. Other than them, it seems there is no case similar with this. |
|
@rxin I find a related jira[SPARK-16714] that fixed the problem of the array function. so make my report as a sub task of the jira[SPARK-16714] |
|
For the array issue of SPARK-16714, the PR was ready for review at #14353 . |
| case _ if elementType.isInstanceOf[DecimalType] => | ||
| var tighter: DataType = elementType | ||
| colType.foreach { child => | ||
| if (elementType.asInstanceOf[DecimalType].isTighterThan(child.dataType)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isTighterThan is not associative - i think this would be a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rxin I have checked this function, and it will not lost any precision or range ,it 's safe .
and in the checkDecimalType, we just check the datatype and do not change datatype.
(when keys or values contains integer type , it will pass. but still integer type)
so checkInputDataTypes will return result like it done before.
and InCase when keys or values contains integer type, I will use a new function instead of isTighterThan that do not check integer type.
can you give me some advise ? thank you :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I was referring to was that isTighterThan was not associative, and i don't think you can just take the tightest one this way.
As an example:
a precision 10, scale 5
b precision 7, scale 1
in this case a is not tighter than b, but b would be chosen as the target data type, leading to lose of precision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx:) get it
JIRA_ID:SPARK-16735 Description: I have checked this function, and it will not lost any precision or range ,it 's safe . and in the checkDecimalType, we just check the datatype and do not change datatype. (when keys or values contains integer type , it will pass. but still integer type) so checkInputDataTypes will return result like it done before. and InCase when keys or values contains integer type, I will use a new function instead of isTighterThan that do not check integer type. Test:done
|
@biglobster @dongjoon-hyun I created a patch here: #14389 |
…ide.md JIRA_ID:SPARK-16870 Description:efault value for spark.sql.broadcastTimeout is 300s. and this property do not show in any docs of spark. so add "spark.sql.broadcastTimeout" into docs/sql-programming-guide.md to help people to how to fix this timeout error when it happenned Test:done
What changes were proposed in this pull request?
In Spark 2.0, we will parse float literals as decimals. However, it introduces a side-effect, which is described below.
Before
spark-sql> select map(0.1,0.01, 0.2,0.033);
Error in query: cannot resolve 'map(CAST(0.1 AS DECIMAL(1,1)), CAST(0.01 AS DECIMAL(2,2)), CAST(0.2 AS DECIMAL(1,1)), CAST(0.033 AS DECIMAL(3,3)))' due to data type mismatch: The given values of function map should all be the same type, but they are [decimal(2,2), decimal(3,3)]; line 1 pos 7
After
spark-sql> select map(0.1,0.01, 0.2,0.033);
{0.1:0.010,0.2:0.033}
Time taken: 2.448 seconds, Fetched 1 row(s)
How was this patch tested?
Pass the run-tests with a new test case.