fix: Final aggregation should not bind to the input of partial aggregation #155

viirya · 2024-03-03T08:56:24Z

Which issue does this PR close?

Closes #156.

Rationale for this change

We now bind aggregate expressions to the input of partial aggregate for HashAggregate no matter what the mode is. However, the output of partial aggregate is not same as its input. This incorrect binding will cause unmatched column index. Currently it doesn't expose issues because we don't check the index of bound reference in native side, and the bound columns in aggregate expressions are not used.

However, this is a potential bug and causes some issues when starting to check the index of bound reference.

What changes are included in this PR?

This patch adds the check of the index of bound reference. The aggregate expressions of final aggregation are not bound to the input of partial aggregation anymore but sent to native side as unbound expressions.

How are these changes tested?

viirya · 2024-03-03T09:02:58Z

cc @huaxingao

…ation

viirya · 2024-03-03T17:56:04Z

cc @sunchao

huaxingao

LGTM. Thanks for catching the problem.

sunchao · 2024-03-04T16:42:15Z

spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala

@@ -38,6 +38,25 @@ import org.apache.comet.CometSparkSessionExtensions.isSpark34Plus
 * Test suite dedicated to Comet native aggregate operator
 */
 class CometAggregateSuite extends CometTestBase with AdaptiveSparkPlanHelper {
+  import testImplicits._
+
+  test("Final aggregation should not bind to the input of partial aggregation") {


hmm does this test the issue in this PR? seems it passes in the main branch without the fix.

After adding the bound index check, it fails. It passes now because we don't have the check now. As I mentioned in the description, the bound is incorrect but because we don't check it, it is not exposed.

Btw, we need to add the bound check as required for SortMergeJoin work. In that work, it checks joining keys bindings internally in DataFusion. Which checks both binding index and binding column name. So we need to bind column reference to input schema.

spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala

sunchao

LGTM

viirya · 2024-03-05T01:04:03Z

Merged. Thanks.

fix: Final aggregation should not bind to the input of partial aggreg…

f5c4758

…ation

viirya force-pushed the fix_final_agg_binding branch from 375f11c to f5c4758 Compare March 3, 2024 09:03

huaxingao approved these changes Mar 4, 2024

View reviewed changes

sunchao reviewed Mar 4, 2024

View reviewed changes

viirya added 3 commits March 4, 2024 10:17

For review

8ed3aa8

fix

96dd228

Merge remote-tracking branch 'upstream/main' into fix_final_agg_binding

5618148

sunchao reviewed Mar 4, 2024

View reviewed changes

spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala Show resolved Hide resolved

Fix merging error

ad90fcf

sunchao approved these changes Mar 5, 2024

View reviewed changes

viirya merged commit a131c44 into apache:main Mar 5, 2024
19 checks passed

viirya deleted the fix_final_agg_binding branch March 5, 2024 01:04

sunchao mentioned this pull request Mar 5, 2024

minor: Remove unnecessary logic #169

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Final aggregation should not bind to the input of partial aggregation #155

fix: Final aggregation should not bind to the input of partial aggregation #155

viirya commented Mar 3, 2024 •

edited

Loading

viirya commented Mar 3, 2024

viirya commented Mar 3, 2024

huaxingao left a comment

sunchao Mar 4, 2024

viirya Mar 4, 2024

viirya Mar 4, 2024

sunchao left a comment

viirya commented Mar 5, 2024

fix: Final aggregation should not bind to the input of partial aggregation #155

fix: Final aggregation should not bind to the input of partial aggregation #155

Conversation

viirya commented Mar 3, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

viirya commented Mar 3, 2024

viirya commented Mar 3, 2024

huaxingao left a comment

Choose a reason for hiding this comment

sunchao Mar 4, 2024

Choose a reason for hiding this comment

viirya Mar 4, 2024

Choose a reason for hiding this comment

viirya Mar 4, 2024

Choose a reason for hiding this comment

sunchao left a comment

Choose a reason for hiding this comment

viirya commented Mar 5, 2024

viirya commented Mar 3, 2024 •

edited

Loading