-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Final aggregation should not bind to the input of partial aggregation #155
Conversation
cc @huaxingao |
375f11c
to
f5c4758
Compare
cc @sunchao |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for catching the problem.
@@ -38,6 +38,25 @@ import org.apache.comet.CometSparkSessionExtensions.isSpark34Plus | |||
* Test suite dedicated to Comet native aggregate operator | |||
*/ | |||
class CometAggregateSuite extends CometTestBase with AdaptiveSparkPlanHelper { | |||
import testImplicits._ | |||
|
|||
test("Final aggregation should not bind to the input of partial aggregation") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm does this test the issue in this PR? seems it passes in the main
branch without the fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After adding the bound index check, it fails. It passes now because we don't have the check now. As I mentioned in the description, the bound is incorrect but because we don't check it, it is not exposed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, we need to add the bound check as required for SortMergeJoin work. In that work, it checks joining keys bindings internally in DataFusion. Which checks both binding index and binding column name. So we need to bind column reference to input schema.
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Merged. Thanks. |
Which issue does this PR close?
Closes #156.
Rationale for this change
We now bind aggregate expressions to the input of partial aggregate for
HashAggregate
no matter what the mode is. However, the output of partial aggregate is not same as its input. This incorrect binding will cause unmatched column index. Currently it doesn't expose issues because we don't check the index of bound reference in native side, and the bound columns in aggregate expressions are not used.However, this is a potential bug and causes some issues when starting to check the index of bound reference.
What changes are included in this PR?
This patch adds the check of the index of bound reference. The aggregate expressions of final aggregation are not bound to the input of partial aggregation anymore but sent to native side as unbound expressions.
How are these changes tested?