-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-19385][SQL] During canonicalization, NOT(...(l, r)) should not expect such cases that l.hashcode > r.hashcode
#16719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #72075 has finished for PR 16719 at commit
|
|
@cloud-fan @gatorsmile @dongjoon-hyun would you take a look, thanks! |
|
Test build #72076 has finished for PR 16719 at commit
|
|
Hi, @lw-lin . |
| setTest(1, Not(maxHash >= 1), maxHash < 1, Not(Literal(1) <= maxHash), Literal(1) > maxHash) | ||
| setTest(1, Not(minHash >= 1), minHash < 1, Not(Literal(1) <= minHash), Literal(1) > minHash) | ||
| setTest(1, Not(maxHash <= 1), maxHash > 1, Not(Literal(1) >= maxHash), Literal(1) < maxHash) | ||
| setTest(1, Not(minHash <= 1), minHash > 1, Not(Literal(1) >= minHash), Literal(1) < minHash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These test cases are covered previously correctly. Actually, this PR simplifies the logics only. Am I right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea sure they are covered correctly even prior to this patch's changes!
the previous aUpper'hashcode is either greater than or less than 1's hashcode but can not be both, while this change aims to test both cases -- but I'm quite open to revert the changes if they are considered unnecessary.
| case Not(LessThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => LessThanOrEqual(r, l) | ||
| case Not(LessThanOrEqual(l, r)) => GreaterThan(l, r) | ||
| case Not(GreaterThan(l, r)) => | ||
| assert(l.hashCode() <= r.hashCode()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove these asserts? It seems to be verified with your test cases now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! maybe an alternative way is to add comments saying it's guaranteed that l.hashcode <= r.hashcode, otherwise people might wonder why there is no case Not(GreaterThan(l, r)) if l.hashCode() > r.hashCode() at their first glance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be fine to get rid of assert, as long as we add the code comments and the needed test cases.
|
The original logic was designed to be safe for changing the caller bottom-up code, here. lazy val canonicalized: Expression = {
val canonicalizedChildren = children.map(_.canonicalized)
Canonicalize.execute(withNewChildren(canonicalizedChildren))
}But, I agree that it's safe to simplify that with the new @lw-lin 's test cases. For the For me, LGTM except that. Oh, could you update the title, |
NOT(l, r) should not expect such cases that l.hashcode > r.hashcodeNOT(...(l, r)) should not expect such cases that l.hashcode > r.hashcode
| case GreaterThanOrEqual(l, r) if l.hashCode() > r.hashCode() => LessThanOrEqual(r, l) | ||
| case LessThanOrEqual(l, r) if l.hashCode() > r.hashCode() => GreaterThanOrEqual(r, l) | ||
|
|
||
| case Not(GreaterThan(l, r)) if l.hashCode() > r.hashCode() => GreaterThan(r, l) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a dead code, because our canonicalization order is bottom up, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uh. Just saw the above comment from @dongjoon-hyun . Thanks!
| // maxHash's hashcode is calculated based on this exprId's hashcode, so we set this | ||
| // exprId's hashCode to this specific value to make sure maxHash's hashcode is almost | ||
| // `Int.MaxValue` | ||
| override def hashCode: Int = 826929706 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not override def hashCode: Int = Int.MaxValue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks.
the reason is in Canonicalize.scala#ignoreNamesTypes, we're making copies of e (maxHash in this case):
private def ignoreNamesTypes(e: Expression): Expression = e match {
case a: AttributeReference =>
AttributeReference("none", a.dataType.asNullable)(exprId = a.exprId)
case _ => e
}so, even if we override def hashCode: Int = Int.MaxValue on maxHash, it has nothing to do with the copy's hashcode.
then i took a step back -- by defining exprId's hashcode to a specific value (as provided in this patch), we further defined the copied attribute-reference's hashcode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uh, I did not read the comment carefully. Thanks for the explanation.
You can set it to -1030353449. Then, maxHash.hashCode() will be equal to Int.MaxValue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, -1030353449 works great! let me push a commit updating this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
| // minHash's hashcode is calculated based on this exprId's hashcode, so we set this | ||
| // exprId's hashCode to this specific value to make sure minHash's hashcode is almost | ||
| // `Int.MinValue` | ||
| override def hashCode: Int = 826929707 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not override def hashCode: Int = Int.MinValue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make minHash.hashCode() equal to Int.MinValue, you can set it to 1407330692
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated, thanks!
|
Test build #72123 has started for PR 16719 at commit |
| case Not(LessThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => LessThanOrEqual(r, l) | ||
| case Not(LessThanOrEqual(l, r)) => GreaterThan(l, r) | ||
| case Not(GreaterThan(l, r)) => | ||
| assert(l.hashCode() <= r.hashCode()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove assert, because the test cases already cover the scenario. You can add a comment to explain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure!
|
LGTM except one comment. Thanks! |
|
Test build #72126 has finished for PR 16719 at commit
|
|
Jenkins retest this please |
|
Test build #72128 has finished for PR 16719 at commit
|
|
|
||
| case Not(GreaterThan(l, r)) if l.hashCode() > r.hashCode() => GreaterThan(r, l) | ||
| // Note in the following `NOT` cases, `l.hashCode() <= r.hashCode()` holds. The reason is that | ||
| // canonicalization is conducted bottom-up -- see [[Expression.canonicalized]]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To the other reviewers, this PR added test cases in ExpressionSetSuite.scala to ensure it. Thus, it is safe to clean the codes.
|
Thanks! Merging to master. |
…ot expect such cases that l.hashcode > r.hashcode ## What changes were proposed in this pull request? During canonicalization, `NOT(...(l, r))` should not expect such cases that `l.hashcode > r.hashcode`. Take the rule `case NOT(GreaterThan(l, r)) if l.hashcode > r.hashcode` for example, it should never be matched since `GreaterThan(l, r)` itself would be re-written as `GreaterThan(r, l)` given `l.hashcode > r.hashcode` after canonicalization. This patch consolidates rules like `case NOT(GreaterThan(l, r)) if l.hashcode > r.hashcode` and `case NOT(GreaterThan(l, r))`. ## How was this patch tested? This patch expanded the `NOT` test case to cover both cases where: - `l.hashcode > r.hashcode` - `l.hashcode < r.hashcode` Author: Liwei Lin <lwlin7@gmail.com> Closes apache#16719 from lw-lin/canonicalize.
What changes were proposed in this pull request?
During canonicalization,
NOT(...(l, r))should not expect such cases thatl.hashcode > r.hashcode.Take the rule
case NOT(GreaterThan(l, r)) if l.hashcode > r.hashcodefor example, it should never be matched sinceGreaterThan(l, r)itself would be re-written asGreaterThan(r, l)givenl.hashcode > r.hashcodeafter canonicalization.This patch consolidates rules like
case NOT(GreaterThan(l, r)) if l.hashcode > r.hashcodeandcase NOT(GreaterThan(l, r)).How was this patch tested?
This patch expanded the
NOTtest case to cover both cases where:l.hashcode > r.hashcodel.hashcode < r.hashcode