You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently we treat partial and final aggregation operators separately during Comet planner. So theoretically you could get a Comet partial aggregation + Spark final aggregation.
The issue of this combination is that some aggregation functions in DataFusion may use unsigned integer types which cannot be properly mapped to Spark data type (e.g., Uint64 -> LongType). If we have a Comet partial aggregation + Spark final aggregation, it is possibly overflowing in runtime.
Actually I think only partial aggregation in Comet doesn't help too much. Because it means Comet shuffle is not enabled. Only partial aggregation directly on top of a Comet Scan will be transformed to Comet partial aggregation in such cases. I think it is very limited.
I think we can treat partial + final aggregation as a whole and enable/disalbe Comet aggregation (partition + final) together.
Describe the potential solution
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
I re-think about this. Actually if there is any unsupported types like UInt64 from DataFusion partial aggregation state. Even we make partial and final aggregation as a whole, it doesn't solve the issue. It is because we need shuffle such unsupported arrays like UInt64. But we cannot arbitrarily assign an UInt64 array to Spark LongType like #216 did. Spark LongType is supported to be an Int64 array. Wronly binding an Uint64 array with LongType doesn't work with shuffle.
What is the problem the feature request solves?
Currently we treat partial and final aggregation operators separately during Comet planner. So theoretically you could get a Comet partial aggregation + Spark final aggregation.
The issue of this combination is that some aggregation functions in DataFusion may use unsigned integer types which cannot be properly mapped to Spark data type (e.g., Uint64 -> LongType). If we have a Comet partial aggregation + Spark final aggregation, it is possibly overflowing in runtime.
Actually I think only partial aggregation in Comet doesn't help too much. Because it means Comet shuffle is not enabled. Only partial aggregation directly on top of a Comet Scan will be transformed to Comet partial aggregation in such cases. I think it is very limited.
I think we can treat partial + final aggregation as a whole and enable/disalbe Comet aggregation (partition + final) together.
Describe the potential solution
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: