Skip to content

Conversation

@cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Sep 20, 2018

What changes were proposed in this pull request?

#20023 proposed to allow precision lose during decimal operations, to reduce the possibilities of overflow. This is a behavior change and is protected by the DECIMAL_OPERATIONS_ALLOW_PREC_LOSS config. However, that PR introduced another behavior change: pick a minimum precision for integral literals, which is not protected by a config. This PR add a new config for it: spark.sql.legacy.literal.pickMinimumPrecision.

This can allow users to work around issue in SPARK-25454, which is caused by a long-standing bug of negative scale.

How was this patch tested?

a new test

@cloud-fan
Copy link
Contributor Author

cloud-fan commented Sep 20, 2018

@cloud-fan
Copy link
Contributor Author

Maybe we should also consider to turn off DECIMAL_OPERATIONS_ALLOW_PREC_LOSS by default.

@mgaido91
Copy link
Contributor

mgaido91 commented Sep 20, 2018

I don't really agree with this. As I mentioned here, I think we can avoid to use negative scales regardless of the value of the flag, since before #20023 a negative scale here is not possible. So I think we can just add a check for avoiding a non-negative scale in DecimalType.fromLiteral.

Doing so, there is no need to turn the flag off by default.

@cloud-fan
Copy link
Contributor Author

So I think we can just add a check for avoiding a non-negative scale in DecimalType.fromLiteral.

Can you open a PR for it? I did some experiments locally and this seems not work.

@mgaido91
Copy link
Contributor

Sure @cloud-fan, I'll do asap.

@mgaido91
Copy link
Contributor

oh, I see now the problem. And it is much bigger than I thought, sorry. Here we were returning a negative scale for 1e6 (it happens in the AstBuilder, as we build a BigDecimal from it) also before the change in #20023. But before we had a very high (unneeded) precision for 1000, which "fixed" the issue with the negative scale. This means that we were lucky in that specific query. If you try, for instance, select 26393499451 / 1000e6, then this fails also with your fix and before #20023. So this is not a solution either, I think.

@cloud-fan
Copy link
Contributor Author

yea, that's why I change the test to use 1e6 * 1000, instead of 1000e6.

The point is, we know there is a bug and it's hard to fix. What I'm trying to do here is not fixing the bug, but to allow users to fully turn off #20023. Thus we can avoid regressions and not make it worse.

@cloud-fan
Copy link
Contributor Author

2.3.2 and 2.4.0 are both in the RC stage, I don't want to spend a lot of time to fix this long-standing bug and block 2 releases. cc @jerryshao too.

@mgaido91
Copy link
Contributor

yes @cloud-fan, I see your point. I am neutral to this change honestly.

It is true that it avoids regressions form previous cases, but it is also true that it doesn't make the behavior better: I mean in some (weird) cases like this one it behaves better, but there are other cases when the opposite is true since we were overestimating the precision.

@cloud-fan
Copy link
Contributor Author

This PR is a no-op if users don't turn off #20023 . The benefit of this PR is: before we fully fix that bug, if a user hit it, he can turn off #20023 to temporarily work around it. So I don't see any harm of this PR.

Turning off #20023 by default is another story and I'd like to get more feedback before doing it.

@mgaido91
Copy link
Contributor

So I don't see any harm of this PR.

If the user doesn't turn off the flag, of course nothing changes. If the user does, then let's imagine this case. A user has this: select 1234567891 / (cast(2 as bigint) * (a decimal(1, 1) number) * cast(2 as bigint)). Before this PR, this operation works fine. After the PR the right operand would have a precision which is at least 38 (while it could be much less, ie. 4 before this change). The result of the division would be a decimal(38, 38): so the operation would overflow.

It is true that after this PR, Spark will support (with the same exact precision) all the operations that worked fine before #20023. But this PR also can break some operation that started working fine after #20023 (even with the flag turned off).

@cloud-fan
Copy link
Contributor Author

cloud-fan commented Sep 20, 2018

why do we care about breaking operations when turning off a behavior change config? The config is prepared for this case: if a user hit a problem of the new behavior, he can use this config to restore to the previous behavior. For this config, the previous behavior is the behavior before #20023.

If your argument is, picking a precise precision for literal is an individual featue and not related to #20023, I'm OK to create a new config for it.

@SparkQA
Copy link

SparkQA commented Sep 20, 2018

Test build #96360 has finished for PR 22494 at commit 0f6ec74.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor

dilipbiswal commented Sep 20, 2018

@cloud-fan The change looks fine to me. I looked at the failure. It correctly switches to old behaviour when this config is set to off.

@SparkQA
Copy link

SparkQA commented Sep 21, 2018

Test build #96392 has finished for PR 22494 at commit 1ee9f02.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor

retest this please

@mgaido91
Copy link
Contributor

If your argument is, picking a precise precision for literal is an individual featue and not related to #20023, I'm OK to create a new config for it.

Yes this is - I think - a better option. Indeed, what I meant was this: let's imagine I am a Spark 2.3.0 user and I have DECIMAL_OPERATIONS_ALLOW_PREC_LOSS turned to false. Before this patch, I can successfully run select 1234567891 / (1.1 * 2 * 2 * 2 * 2). After this patch, this query would return null instead, as an overflow would happen. So this patch is "correcting" a regression from 2.2 but it is introducing another one from 2.3.0-2.3.1.

Using another config is therefore a better workaround IMO.

@cloud-fan
Copy link
Contributor Author

I tried to add a new config, but decided to not do it. The problem is, when users hit SPARK-25454, they must turn off both the DECIMAL_OPERATIONS_ALLOW_PREC_LOSS and the new config.

DECIMAL_OPERATIONS_ALLOW_PREC_LOSS is an internal config and is true by default. The only reason to turn it off is: we hit SPARK-25454. That said, I won't treat it as a regression, as a user will not turn it off to run select 1234567891 / (1.1 * 2 * 2 * 2 * 2).

@mgaido91
Copy link
Contributor

The problem is, when users hit SPARK-25454, they must turn off both the DECIMAL_OPERATIONS_ALLOW_PREC_LOSS and the new config.

If a user hits SPARK-25454, the value of DECIMAL_OPERATIONS_ALLOW_PREC_LOSS is not relevant.

The only reason to turn it off is: we hit SPARK-25454

No if SPARK-25454, turning it off is helpless. The only reason to turn it off is to get null instead of facing a precision loss when we need a precision higher than 38 and a scale higher than 6.

@cloud-fan
Copy link
Contributor Author

cloud-fan commented Sep 21, 2018

Sorry my mistake. I'm talking about the specific query reported at https://issues.apache.org/jira/browse/SPARK-22036?focusedCommentId=16618104&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16618104 , which needs to turn off DECIMAL_OPERATIONS_ALLOW_PREC_LOSS.

SPARK-25454 is a long-standing bug and currently we can't help users to work around it.

My point is, to work around this regression, user must turn off both DECIMAL_OPERATIONS_ALLOW_PREC_LOSS and the new config, which makes me think we should not create a new config.

After this patch, this query would return null instead, as an overflow would happen. So this patch is "correcting" a regression from 2.2 but it is introducing another one from 2.3.0-2.3.1.

I don't agree with it. Users can turn on DECIMAL_OPERATIONS_ALLOW_PREC_LOSS to make the query work. We should not fix values of some configs and then define regression, that's not a regression.

The reason why this is a regression is: users have no way to get the same (and expected) result of 2.3 in 2.4.

@mgaido91
Copy link
Contributor

mgaido91 commented Sep 21, 2018

I'm talking about the specific query reported at https://issues.apache.org/jira/browse/SPARK-22036?focusedCommentId=16618104&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16618104 , which needs to turn off DECIMAL_OPERATIONS_ALLOW_PREC_LOSS.

Yes I am talking about that too, and there is no need to turn off DECIMAL_OPERATIONS_ALLOW_PREC_LOSS for that query. If you define another config and you switch only the other config, you'll see that the result is correct, even with DECIMAL_OPERATIONS_ALLOW_PREC_LOSS turned on.

@SparkQA
Copy link

SparkQA commented Sep 21, 2018

Test build #96406 has finished for PR 22494 at commit 1ee9f02.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan cloud-fan changed the title [SPARK-22036][SQL][followup] DECIMAL_OPERATIONS_ALLOW_PREC_LOSS should cover all the behavior changes [SPARK-22036][SQL][followup] add a new config for picking precise precision for integral literals Sep 21, 2018
@cloud-fan
Copy link
Contributor Author

sorry, I screwed up my local branch and made a wrong conclusion. Turning off the precise precision for integral literals can fix the regression. So it's better to add a config for it.

.doc("When integral literals are used with decimals in binary operators, Spark will " +
"pick a precise precision for the literals to calculate the precision and scale " +
"of the result decimal, when this config is true. By picking a precise precision, we " +
"can avoid wasting precision, to reduce the possibility of overflow.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to reduce the possibility of overflow actually this is not true and depends on the value of DECIMAL_OPERATIONS_ALLOW_PREC_LOSS, If DECIMAL_OPERATIONS_ALLOW_PREC_LOSS is true, the risk is to have a precision loss, but we don't overflow. If that is false, then this statement is right.

.booleanConf
.createWithDefault(true)

val LITERAL_PRECISE_PRECISION =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmh... PRECISE_PRECISION sounds weird... can we look for a better one? I don't have a suggestion right now, but I'll think about it.

buildConf("spark.sql.literal.precisePrecision")
.internal()
.doc("When integral literals are used with decimals in binary operators, Spark will " +
"pick a precise precision for the literals to calculate the precision and scale " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a precise precision -> the minimal precision required to represent the given value?

}

test("SPARK-25454: decimal division with negative scale") {
// TODO: completely fix this issue even LITERAL_PRECISE_PRECISION is true.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: even when

@cloud-fan cloud-fan changed the title [SPARK-22036][SQL][followup] add a new config for picking precise precision for integral literals [SPARK-22036][SQL][followup] add a new config for picking minimum precision for integral literals Sep 21, 2018
Copy link
Contributor

@mgaido91 mgaido91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from a minor comment, thanks @cloud-fan.

Shall we also document this somehow in the migration guide or on the JIRA? Otherwise, how can users know about it?

nit: what about referencing SPARK-25454 in the title though? This seems more a temp fix for that specific issue than a follow-up of SPARK-22036 to me.

.createWithDefault(true)

val LITERAL_PICK_MINIMUM_PRECISION =
buildConf("spark.sql.literal.pickMinimumPrecision")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I am thinking whether we can consider this as a legacy config. We can also remove it once the PR for SPARK-25454 will be merged and/or we remove the support to negative scale decimals. What do you think?

@cloud-fan cloud-fan changed the title [SPARK-22036][SQL][followup] add a new config for picking minimum precision for integral literals [SPARK-25454][SQL] add a new config for picking minimum precision for integral literals Sep 21, 2018
@SparkQA
Copy link

SparkQA commented Sep 21, 2018

Test build #96423 has finished for PR 22494 at commit cc149be.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Sep 21, 2018

Test build #96425 has finished for PR 22494 at commit ad79c56.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 21, 2018

Test build #96429 has finished for PR 22494 at commit b4fdd13.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 21, 2018

Test build #96446 has finished for PR 22494 at commit b4fdd13.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

@mgaido91 Since we are going to merge this PR to both 2.4 and 2.3, and the default behavior is not changed. I think we don't need to add migration guide. I'll post this workaround to the JIRA ticket when this PR is merged.

@mgaido91
Copy link
Contributor

yes, sounds ok to me. Thanks. LGTM.

@SparkQA
Copy link

SparkQA commented Sep 24, 2018

Test build #96509 has started for PR 22494 at commit faa0bd0.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Sep 25, 2018

Test build #96533 has finished for PR 22494 at commit faa0bd0.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Sep 25, 2018

Test build #96539 has finished for PR 22494 at commit faa0bd0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks! Merged to master/2.4/2.3

asfgit pushed a commit that referenced this pull request Sep 27, 2018
… integral literals

## What changes were proposed in this pull request?

#20023 proposed to allow precision lose during decimal operations, to reduce the possibilities of overflow. This is a behavior change and is protected by the DECIMAL_OPERATIONS_ALLOW_PREC_LOSS config. However, that PR introduced another behavior change: pick a minimum precision for integral literals, which is not protected by a config. This PR add a new config for it: `spark.sql.literal.pickMinimumPrecision`.

This can allow users to work around issue in SPARK-25454, which is caused by a long-standing bug of negative scale.

## How was this patch tested?

a new test

Closes #22494 from cloud-fan/decimal.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
(cherry picked from commit d0990e3)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
asfgit pushed a commit that referenced this pull request Sep 27, 2018
… integral literals

## What changes were proposed in this pull request?

#20023 proposed to allow precision lose during decimal operations, to reduce the possibilities of overflow. This is a behavior change and is protected by the DECIMAL_OPERATIONS_ALLOW_PREC_LOSS config. However, that PR introduced another behavior change: pick a minimum precision for integral literals, which is not protected by a config. This PR add a new config for it: `spark.sql.literal.pickMinimumPrecision`.

This can allow users to work around issue in SPARK-25454, which is caused by a long-standing bug of negative scale.

## How was this patch tested?

a new test

Closes #22494 from cloud-fan/decimal.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
(cherry picked from commit d0990e3)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
@asfgit asfgit closed this in d0990e3 Sep 27, 2018
daspalrahul pushed a commit to daspalrahul/spark that referenced this pull request Sep 29, 2018
… integral literals

## What changes were proposed in this pull request?

apache#20023 proposed to allow precision lose during decimal operations, to reduce the possibilities of overflow. This is a behavior change and is protected by the DECIMAL_OPERATIONS_ALLOW_PREC_LOSS config. However, that PR introduced another behavior change: pick a minimum precision for integral literals, which is not protected by a config. This PR add a new config for it: `spark.sql.literal.pickMinimumPrecision`.

This can allow users to work around issue in SPARK-25454, which is caused by a long-standing bug of negative scale.

## How was this patch tested?

a new test

Closes apache#22494 from cloud-fan/decimal.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants