Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Remove internal GroupBy implementation #14866

Closed
westonpace opened this issue Dec 7, 2022 · 2 comments · Fixed by #14867
Closed

[C++] Remove internal GroupBy implementation #14866

westonpace opened this issue Dec 7, 2022 · 2 comments · Fixed by #14867

Comments

@westonpace
Copy link
Member

Describe the enhancement requested

Currently there are two ways to compute a group by. The supported way is to use an aggregate node in an exec plan. The second (internal) way is to use the internal function arrow::internal::GroupBy. This internal function simulates, but does not actually use, an aggregate node.

The internal implementation has caused issues in the past where we did not notice an error in the aggregate node's invocation of aggregate kernels since we use the internal function for testing aggregates and it behaves slightly differently. The internal implementation also requires maintenance and significantly complicated #14352 .

I would like to remove the internal implementation. Unfortunately, the internal implementation is used by tests, benchmarks, and pyarrow. However, we should be able to update those bindings to a friendly wrapper around exec plans.

Component(s)

C++

westonpace added a commit to westonpace/arrow that referenced this issue Dec 9, 2022
… accepting arrays and returning table, with unit tests
westonpace added a commit to westonpace/arrow that referenced this issue Dec 23, 2022
…d of emulating one

apacheGH-14866: converted GroupBy into a proper convenience function, accepting arrays and returning table, with unit tests
westonpace added a commit to westonpace/arrow that referenced this issue Dec 23, 2022
…d of emulating one

apacheGH-14866: converted GroupBy into a proper convenience function, accepting arrays and returning table, with unit tests
westonpace added a commit to westonpace/arrow that referenced this issue Jan 4, 2023
…d of emulating one

apacheGH-14866: converted GroupBy into a proper convenience function, accepting arrays and returning table, with unit tests
westonpace added a commit to westonpace/arrow that referenced this issue Jan 19, 2023
…d of emulating one

apacheGH-14866: converted GroupBy into a proper convenience function, accepting arrays and returning table, with unit tests
westonpace added a commit to westonpace/arrow that referenced this issue Jan 25, 2023
…d of emulating one

apacheGH-14866: converted GroupBy into a proper convenience function, accepting arrays and returning table, with unit tests
westonpace added a commit to westonpace/arrow that referenced this issue Jan 25, 2023
… directly instead of emulating one

apacheGH-14866: converted GroupBy into a proper convenience function, accepting arrays and returning table, with unit tests
westonpace added a commit to westonpace/arrow that referenced this issue Jan 25, 2023
… directly instead of emulating one

apacheGH-14866: converted GroupBy into a proper convenience function, accepting arrays and returning table, with unit tests
@coady
Copy link

coady commented Feb 11, 2023

I think the current _group_by interface is awkward, because it requires target "names", which are only used for function arity. But it looks like #14867 will replace that anyway.

Also, I've been using _group_by for aggregating batches, with out-of-core data. Whereas #14867 looks like it will require a table. Just something to consider, but I guess I could always wrap a single record batch into a table.

Thanks.

@westonpace
Copy link
Member Author

Yes, wrapping a single batch in a table should be effective.

westonpace added a commit to westonpace/arrow that referenced this issue Feb 17, 2023
… directly instead of emulating one

apacheGH-14866: converted GroupBy into a proper convenience function, accepting arrays and returning table, with unit tests
westonpace added a commit that referenced this issue Feb 21, 2023
* Closes: #14866

Authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: Weston Pace <weston.pace@gmail.com>
@westonpace westonpace added this to the 12.0.0 milestone Feb 21, 2023
fatemehp pushed a commit to fatemehp/arrow that referenced this issue Feb 24, 2023
…14867)

* Closes: apache#14866

Authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: Weston Pace <weston.pace@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants