Skip to content

Conversation

@sunchao
Copy link
Member

@sunchao sunchao commented Jan 18, 2022

What changes were proposed in this pull request?

Pass isDeterministic flag to ApplyFunctionExpression, Invoke and StaticInvoke when processing V2 scalar functions.

Why are the changes needed?

A V2 scalar function can be declared as non-deterministic. However, currently Spark doesn't pass the flag when converting the V2 function to a catalyst expression, which could lead to incorrect results if being applied with certain optimizations.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added a unit test.

propagateNull: Boolean = true,
returnNullable: Boolean = true) extends InvokeLike {
returnNullable: Boolean = true,
isDeterministic: Boolean = true) extends InvokeLike {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this defaults to true?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in majority cases a function is deterministic, so defaulting to true here. This is similar to how we treat propagateNull and returnNullable here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for default true.

}
}


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: extra blank line

}
}

test("SPARK-37957: pass deterministic flag") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make the test name more clear? pass deterministic flat to what

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure


override def nullable: Boolean = needNullCheck || returnNullable
override def children: Seq[Expression] = arguments
override lazy val deterministic: Boolean = isDeterministic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also check if all the arguments are deterministic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops you are right. Let me fix it.


override def nullable: Boolean = needNullCheck || returnNullable
override def children: Seq[Expression] = arguments
override lazy val deterministic: Boolean = isDeterministic && arguments.forall(_.deterministic)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: seems we can move this to InvokeLike

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this too, but then we'd have to move isDeterministic there too and then override it in StaticInvoke and Invoke, so doesn't seem we can save much. Plus I think deterministic property is not useful for NewInstance.

@sunchao sunchao closed this in 9ffca0c Jan 19, 2022
sunchao added a commit that referenced this pull request Jan 19, 2022
…nctions

### What changes were proposed in this pull request?

Pass `isDeterministic` flag to `ApplyFunctionExpression`, `Invoke` and `StaticInvoke` when processing V2 scalar functions.

### Why are the changes needed?

A V2 scalar function can be declared as non-deterministic. However, currently Spark doesn't pass the flag when converting the V2 function to a catalyst expression, which could lead to incorrect results if being applied with certain optimizations.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added a unit test.

Closes #35243 from sunchao/SPARK-37957.

Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: Chao Sun <sunchao@apple.com>
@sunchao
Copy link
Member Author

sunchao commented Jan 19, 2022

Thanks! merged to master and branch-3.2.

@sunchao sunchao deleted the SPARK-37957 branch January 19, 2022 22:10
@dongjoon-hyun
Copy link
Member

Thank you all!

propagateNull: Boolean = true,
returnNullable: Boolean = true) extends InvokeLike {
returnNullable: Boolean = true,
isDeterministic: Boolean = true) extends InvokeLike {
Copy link
Member

@HyukjinKwon HyukjinKwon Feb 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be controversial to backport .. as

  1. V2 expressions are unstable yet.
  2. It could lead to different results in maintenance version upgrade if a user sets isDeterministic to false
  3. Maybe performance regression if a user sets isDeterministic to false.
  4. We haven't heard an actual complaint from user mailing list.
  5. This makes an API (isDeterministic) working that has never been working in a way (is this a bug fix?)

While I agree with this being merged in 3.3.0, and I don't feel strongly on this in 3.2.X, maybe we can consider reverting this out of branch-3.2 because it has a good and bad thing. If we're worried about this change, we could issue a warning instead when isDeterministic from V2 Scalar Function returns false.

I will leave it to you @sunchao.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bug fix. A V2 catalog can return a function that's non-deterministic, while without the fix Spark can treat it as deterministic and apply related optimization rules (e.g., constant folding), which could cause correctness issues.

Since this is already in Spark 3.2.1, I don't see much benefit of reverting it and re-introduce the correctness issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on keeping this in 3.2.x. This fixed the correctness issue and we actually intentionally included this fix in 3.2.1 release.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine. I didn't have a strong preference so I am okay with keeping it either 👍 .

catalinii pushed a commit to lyft/spark that referenced this pull request Feb 22, 2022
…nctions

### What changes were proposed in this pull request?

Pass `isDeterministic` flag to `ApplyFunctionExpression`, `Invoke` and `StaticInvoke` when processing V2 scalar functions.

### Why are the changes needed?

A V2 scalar function can be declared as non-deterministic. However, currently Spark doesn't pass the flag when converting the V2 function to a catalyst expression, which could lead to incorrect results if being applied with certain optimizations.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added a unit test.

Closes apache#35243 from sunchao/SPARK-37957.

Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: Chao Sun <sunchao@apple.com>
catalinii pushed a commit to lyft/spark that referenced this pull request Mar 4, 2022
…nctions

### What changes were proposed in this pull request?

Pass `isDeterministic` flag to `ApplyFunctionExpression`, `Invoke` and `StaticInvoke` when processing V2 scalar functions.

### Why are the changes needed?

A V2 scalar function can be declared as non-deterministic. However, currently Spark doesn't pass the flag when converting the V2 function to a catalyst expression, which could lead to incorrect results if being applied with certain optimizations.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added a unit test.

Closes apache#35243 from sunchao/SPARK-37957.

Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: Chao Sun <sunchao@apple.com>
kazuyukitanimura pushed a commit to kazuyukitanimura/spark that referenced this pull request Aug 10, 2022
…nctions

### What changes were proposed in this pull request?

Pass `isDeterministic` flag to `ApplyFunctionExpression`, `Invoke` and `StaticInvoke` when processing V2 scalar functions.

### Why are the changes needed?

A V2 scalar function can be declared as non-deterministic. However, currently Spark doesn't pass the flag when converting the V2 function to a catalyst expression, which could lead to incorrect results if being applied with certain optimizations.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added a unit test.

Closes apache#35243 from sunchao/SPARK-37957.

Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: Chao Sun <sunchao@apple.com>
(cherry picked from commit 3860ac5)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants