-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-37957][SQL] Correctly pass deterministic flag for V2 scalar functions #35243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| propagateNull: Boolean = true, | ||
| returnNullable: Boolean = true) extends InvokeLike { | ||
| returnNullable: Boolean = true, | ||
| isDeterministic: Boolean = true) extends InvokeLike { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this defaults to true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in majority cases a function is deterministic, so defaulting to true here. This is similar to how we treat propagateNull and returnNullable here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for default true.
| } | ||
| } | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: extra blank line
| } | ||
| } | ||
|
|
||
| test("SPARK-37957: pass deterministic flag") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make the test name more clear? pass deterministic flat to what
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
|
|
||
| override def nullable: Boolean = needNullCheck || returnNullable | ||
| override def children: Seq[Expression] = arguments | ||
| override lazy val deterministic: Boolean = isDeterministic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should also check if all the arguments are deterministic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops you are right. Let me fix it.
|
|
||
| override def nullable: Boolean = needNullCheck || returnNullable | ||
| override def children: Seq[Expression] = arguments | ||
| override lazy val deterministic: Boolean = isDeterministic && arguments.forall(_.deterministic) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: seems we can move this to InvokeLike
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this too, but then we'd have to move isDeterministic there too and then override it in StaticInvoke and Invoke, so doesn't seem we can save much. Plus I think deterministic property is not useful for NewInstance.
…nctions ### What changes were proposed in this pull request? Pass `isDeterministic` flag to `ApplyFunctionExpression`, `Invoke` and `StaticInvoke` when processing V2 scalar functions. ### Why are the changes needed? A V2 scalar function can be declared as non-deterministic. However, currently Spark doesn't pass the flag when converting the V2 function to a catalyst expression, which could lead to incorrect results if being applied with certain optimizations. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added a unit test. Closes #35243 from sunchao/SPARK-37957. Authored-by: Chao Sun <sunchao@apple.com> Signed-off-by: Chao Sun <sunchao@apple.com>
|
Thanks! merged to master and branch-3.2. |
|
Thank you all! |
| propagateNull: Boolean = true, | ||
| returnNullable: Boolean = true) extends InvokeLike { | ||
| returnNullable: Boolean = true, | ||
| isDeterministic: Boolean = true) extends InvokeLike { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might be controversial to backport .. as
- V2 expressions are unstable yet.
- It could lead to different results in maintenance version upgrade if a user sets
isDeterministictofalse - Maybe performance regression if a user sets
isDeterministictofalse. - We haven't heard an actual complaint from user mailing list.
- This makes an API (
isDeterministic) working that has never been working in a way (is this a bug fix?)
While I agree with this being merged in 3.3.0, and I don't feel strongly on this in 3.2.X, maybe we can consider reverting this out of branch-3.2 because it has a good and bad thing. If we're worried about this change, we could issue a warning instead when isDeterministic from V2 Scalar Function returns false.
I will leave it to you @sunchao.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a bug fix. A V2 catalog can return a function that's non-deterministic, while without the fix Spark can treat it as deterministic and apply related optimization rules (e.g., constant folding), which could cause correctness issues.
Since this is already in Spark 3.2.1, I don't see much benefit of reverting it and re-introduce the correctness issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on keeping this in 3.2.x. This fixed the correctness issue and we actually intentionally included this fix in 3.2.1 release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine. I didn't have a strong preference so I am okay with keeping it either 👍 .
…nctions ### What changes were proposed in this pull request? Pass `isDeterministic` flag to `ApplyFunctionExpression`, `Invoke` and `StaticInvoke` when processing V2 scalar functions. ### Why are the changes needed? A V2 scalar function can be declared as non-deterministic. However, currently Spark doesn't pass the flag when converting the V2 function to a catalyst expression, which could lead to incorrect results if being applied with certain optimizations. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added a unit test. Closes apache#35243 from sunchao/SPARK-37957. Authored-by: Chao Sun <sunchao@apple.com> Signed-off-by: Chao Sun <sunchao@apple.com>
…nctions ### What changes were proposed in this pull request? Pass `isDeterministic` flag to `ApplyFunctionExpression`, `Invoke` and `StaticInvoke` when processing V2 scalar functions. ### Why are the changes needed? A V2 scalar function can be declared as non-deterministic. However, currently Spark doesn't pass the flag when converting the V2 function to a catalyst expression, which could lead to incorrect results if being applied with certain optimizations. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added a unit test. Closes apache#35243 from sunchao/SPARK-37957. Authored-by: Chao Sun <sunchao@apple.com> Signed-off-by: Chao Sun <sunchao@apple.com>
…nctions ### What changes were proposed in this pull request? Pass `isDeterministic` flag to `ApplyFunctionExpression`, `Invoke` and `StaticInvoke` when processing V2 scalar functions. ### Why are the changes needed? A V2 scalar function can be declared as non-deterministic. However, currently Spark doesn't pass the flag when converting the V2 function to a catalyst expression, which could lead to incorrect results if being applied with certain optimizations. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added a unit test. Closes apache#35243 from sunchao/SPARK-37957. Authored-by: Chao Sun <sunchao@apple.com> Signed-off-by: Chao Sun <sunchao@apple.com> (cherry picked from commit 3860ac5) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
Pass
isDeterministicflag toApplyFunctionExpression,InvokeandStaticInvokewhen processing V2 scalar functions.Why are the changes needed?
A V2 scalar function can be declared as non-deterministic. However, currently Spark doesn't pass the flag when converting the V2 function to a catalyst expression, which could lead to incorrect results if being applied with certain optimizations.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added a unit test.