-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-46366][SQL] Use WITH expression in BETWEEN to avoid duplicate expressions #44299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/betweenExpression.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/betweenExpression.scala
Outdated
Show resolved
Hide resolved
| /** | ||
| * Transform UnresolvedBetweenExpression into a [BetweenExpr]. | ||
| */ | ||
| object ResolveBetweenExpression extends Rule[LogicalPlan] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I tried initially. I tried to explain the problem in PR description, but I guess that I wasn't clear enough. Let me try to do this again:
Initially I tried with BetweenExpr extends RuntimeReplaceable. The problem that I hit was that CommonExpressionRef required dataType of CommonExpressionDef to be known for it to be created:
def this(exprDef: CommonExpressionDef) = this(exprDef.id, exprDef.dataType, exprDef.nullable)To get around this I would like to create replacement of BetweenExpr after I resolve it's Child expressions and hence after I have it's dataType. I hit a wall doing this so I did this thing with UnresolvedBetween->BetweenExpr (for which I will know the types)->With, which works, but I agree that it would be better if we could avoid this + get rid of extra rule.
I will give it another try. If you guys know a way to get around this I would appreciate some help :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I hope that i got it right in the latest iteration. Please take a look.
Btw, thanks for the comment, this is definitely much cleaner.
…/else combination. Even with this rule there are changes in DS suite. We should work to eliminate these.
dtenedor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
Outdated
Show resolved
Hide resolved
…m operator list since it is now in function list.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/StreamingJoinHelper.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Between.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Between.scala
Outdated
Show resolved
Hide resolved
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala
Show resolved
Hide resolved
…o between_expression_v2 merge
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Between.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Between.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Between.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Between.scala
Outdated
Show resolved
Hide resolved
|
let's not forget about this comment :) https://github.com/apache/spark/pull/44299/files#r1436957948 |
|
thanks, merging to master! |
### What changes were proposed in this pull request? Fix for between with ScalarSubqueries. ### Why are the changes needed? There is a regression introduced from a previous PR #44299. This needs to be addressed as between operator was completely broken with resolved ScalarSubqueries. ### Does this PR introduce _any_ user-facing change? No, the bug is not release yet. ### How was this patch tested? Tests added to golden file. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47581 from mihailom-db/fixbetween. Authored-by: Mihailo Milosevic <mihailo.milosevic@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…ql-api-docs.py` to avoid duplication ### What changes were proposed in this pull request? The pr aims to delete `ExpressionInfo[between]` from `gen-sql-api-docs.py` to avoid duplication. ### Why are the changes needed? - In the following doc, `between` is repeatedly displayed `twice` https://spark.apache.org/docs/preview/api/sql/index.html#between <img width="1062" alt="image" src="https://github.com/user-attachments/assets/1aa2ad22-6346-40d7-be1d-cab73b79959a"> After the pr: <img width="751" alt="image" src="https://github.com/user-attachments/assets/a66d607a-9dcb-4d96-a8f9-024f3844055b"> - After #44299, the expression 'between' has been added to `Spark 4.0`. ### Does this PR introduce _any_ user-facing change? Yes, only for docs. ### How was this patch tested? Manually check. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48183 from panbingkun/SPARK-49733. Authored-by: panbingkun <panbingkun@baidu.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>

What changes were proposed in this pull request?
Prior to this change
e BETWEEN lower AND upperexpression used to be transformed intolower <= e && e <= upper. This means thatewould be evaluated twice which is problematic from both correctness and performance perspectives.Suggested fix is to use
WITHexpression that was introduced with this change.Why are the changes needed?
Current implementation is not correct for non deterministic expressions, since two calls might return different results.
Does this PR introduce any user-facing change?
With this change generated plan for BETWEEN statement will be different. An example of generated plan is provided in tests.
How was this patch tested?
Existing tests plus new test in PlanGenerationTestSuite.
Was this patch authored or co-authored using generative AI tooling?
Yes.