Enable Comet aggregation (partition + final) as a whole #223

viirya · 2024-03-21T17:36:04Z

What is the problem the feature request solves?

Currently we treat partial and final aggregation operators separately during Comet planner. So theoretically you could get a Comet partial aggregation + Spark final aggregation.

The issue of this combination is that some aggregation functions in DataFusion may use unsigned integer types which cannot be properly mapped to Spark data type (e.g., Uint64 -> LongType). If we have a Comet partial aggregation + Spark final aggregation, it is possibly overflowing in runtime.

Actually I think only partial aggregation in Comet doesn't help too much. Because it means Comet shuffle is not enabled. Only partial aggregation directly on top of a Comet Scan will be transformed to Comet partial aggregation in such cases. I think it is very limited.

I think we can treat partial + final aggregation as a whole and enable/disalbe Comet aggregation (partition + final) together.

Describe the potential solution

No response

Additional context

No response

viirya · 2024-03-21T17:36:11Z

cc @huaxingao @sunchao

viirya · 2024-03-26T18:40:15Z

I re-think about this. Actually if there is any unsupported types like UInt64 from DataFusion partial aggregation state. Even we make partial and final aggregation as a whole, it doesn't solve the issue. It is because we need shuffle such unsupported arrays like UInt64. But we cannot arbitrarily assign an UInt64 array to Spark LongType like #216 did. Spark LongType is supported to be an Int64 array. Wronly binding an Uint64 array with LongType doesn't work with shuffle.

viirya added the enhancement New feature or request label Mar 21, 2024

viirya mentioned this issue Mar 21, 2024

feat: Support covar_samp and covar_pop #216

Closed

viirya closed this as completed Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable Comet aggregation (partition + final) as a whole #223

Enable Comet aggregation (partition + final) as a whole #223

viirya commented Mar 21, 2024

viirya commented Mar 21, 2024

viirya commented Mar 26, 2024

Enable Comet aggregation (partition + final) as a whole #223

Enable Comet aggregation (partition + final) as a whole #223

Comments

viirya commented Mar 21, 2024

What is the problem the feature request solves?

Describe the potential solution

Additional context

viirya commented Mar 21, 2024

viirya commented Mar 26, 2024