Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Comet aggregation (partition + final) as a whole #223

Closed
viirya opened this issue Mar 21, 2024 · 2 comments
Closed

Enable Comet aggregation (partition + final) as a whole #223

viirya opened this issue Mar 21, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@viirya
Copy link
Member

viirya commented Mar 21, 2024

What is the problem the feature request solves?

Currently we treat partial and final aggregation operators separately during Comet planner. So theoretically you could get a Comet partial aggregation + Spark final aggregation.

The issue of this combination is that some aggregation functions in DataFusion may use unsigned integer types which cannot be properly mapped to Spark data type (e.g., Uint64 -> LongType). If we have a Comet partial aggregation + Spark final aggregation, it is possibly overflowing in runtime.

Actually I think only partial aggregation in Comet doesn't help too much. Because it means Comet shuffle is not enabled. Only partial aggregation directly on top of a Comet Scan will be transformed to Comet partial aggregation in such cases. I think it is very limited.

I think we can treat partial + final aggregation as a whole and enable/disalbe Comet aggregation (partition + final) together.

Describe the potential solution

No response

Additional context

No response

@viirya viirya added the enhancement New feature or request label Mar 21, 2024
@viirya
Copy link
Member Author

viirya commented Mar 21, 2024

cc @huaxingao @sunchao

@viirya
Copy link
Member Author

viirya commented Mar 26, 2024

I re-think about this. Actually if there is any unsupported types like UInt64 from DataFusion partial aggregation state. Even we make partial and final aggregation as a whole, it doesn't solve the issue. It is because we need shuffle such unsupported arrays like UInt64. But we cannot arbitrarily assign an UInt64 array to Spark LongType like #216 did. Spark LongType is supported to be an Int64 array. Wronly binding an Uint64 array with LongType doesn't work with shuffle.

@viirya viirya closed this as completed Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant