Skip to content

Conversation

@dbatomic
Copy link
Contributor

@dbatomic dbatomic commented Dec 11, 2023

What changes were proposed in this pull request?

Prior to this change e BETWEEN lower AND upper expression used to be transformed into lower <= e && e <= upper. This means that e would be evaluated twice which is problematic from both correctness and performance perspectives.

Suggested fix is to use WITH expression that was introduced with this change.

Why are the changes needed?

Current implementation is not correct for non deterministic expressions, since two calls might return different results.

Does this PR introduce any user-facing change?

With this change generated plan for BETWEEN statement will be different. An example of generated plan is provided in tests.

How was this patch tested?

Existing tests plus new test in PlanGenerationTestSuite.

Was this patch authored or co-authored using generative AI tooling?

Yes.

/**
* Transform UnresolvedBetweenExpression into a [BetweenExpr].
*/
object ResolveBetweenExpression extends Rule[LogicalPlan] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid trying to add new analyzer rules at this point. Could we accomplish this by combining UnresolvedBetweenExpression and ResolvedBetweeExpression into just one BetweenExpression and make that inherit from RuntimeReplaceable? Then its replacement can be the With expression of interest.

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I tried initially. I tried to explain the problem in PR description, but I guess that I wasn't clear enough. Let me try to do this again:
Initially I tried with BetweenExpr extends RuntimeReplaceable. The problem that I hit was that CommonExpressionRef required dataType of CommonExpressionDef to be known for it to be created:

  def this(exprDef: CommonExpressionDef) = this(exprDef.id, exprDef.dataType, exprDef.nullable)

To get around this I would like to create replacement of BetweenExpr after I resolve it's Child expressions and hence after I have it's dataType. I hit a wall doing this so I did this thing with UnresolvedBetween->BetweenExpr (for which I will know the types)->With, which works, but I agree that it would be better if we could avoid this + get rid of extra rule.

I will give it another try. If you guys know a way to get around this I would appreciate some help :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I hope that i got it right in the latest iteration. Please take a look.

Btw, thanks for the comment, this is definitely much cleaner.

Copy link
Contributor

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@cloud-fan
Copy link
Contributor

let's not forget about this comment :) https://github.com/apache/spark/pull/44299/files#r1436957948

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 93a2526 Dec 28, 2023
cloud-fan pushed a commit that referenced this pull request Aug 5, 2024
### What changes were proposed in this pull request?
Fix for between with ScalarSubqueries.

### Why are the changes needed?
There is a regression introduced from a previous PR #44299. This needs to be addressed as between operator was completely broken with resolved ScalarSubqueries.

### Does this PR introduce _any_ user-facing change?
No, the bug is not release yet.

### How was this patch tested?
Tests added to golden file.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #47581 from mihailom-db/fixbetween.

Authored-by: Mihailo Milosevic <mihailo.milosevic@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
MaxGekk pushed a commit that referenced this pull request Sep 20, 2024
…ql-api-docs.py` to avoid duplication

### What changes were proposed in this pull request?
The pr aims to delete `ExpressionInfo[between]` from `gen-sql-api-docs.py` to avoid duplication.

### Why are the changes needed?
- In the following doc, `between` is repeatedly displayed `twice`
  https://spark.apache.org/docs/preview/api/sql/index.html#between
  <img width="1062" alt="image" src="https://github.com/user-attachments/assets/1aa2ad22-6346-40d7-be1d-cab73b79959a">

After the pr:
<img width="751" alt="image" src="https://github.com/user-attachments/assets/a66d607a-9dcb-4d96-a8f9-024f3844055b">

- After #44299, the expression 'between' has been added to `Spark 4.0`.

### Does this PR introduce _any_ user-facing change?
Yes, only for docs.

### How was this patch tested?
Manually check.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #48183 from panbingkun/SPARK-49733.

Authored-by: panbingkun <panbingkun@baidu.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants