-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-41183: [C++][Python] Expose recursive flatten for lists on list_flatten kernel function and pyarrow bindings #41295
GH-41183: [C++][Python] Expose recursive flatten for lists on list_flatten kernel function and pyarrow bindings #41295
Conversation
|
@felipecrv @mapleFU PTAL this if you're interested! |
8f01a3a
to
c400b46
Compare
@@ -57,8 +57,14 @@ Result<TypeHolder> LastType(KernelContext*, const std::vector<TypeHolder>& types | |||
} | |||
|
|||
Result<TypeHolder> ListValuesType(KernelContext*, const std::vector<TypeHolder>& args) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add a bool recursive
parameter (without default value) and skip the for loop if it's false
to match existing usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic here seems correct for the recursive = false
condition except one more value_kind
check on is_list(value_kind) || is_list_view(value_kind)
.
And it's the output resolver(a callback to identify the kernel functions's output type) which called only once in expression bind or CallFunction's, and have little impact on kernel path. So maybe the logic here could be retained?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe. I started another thread and asked for clarification #41295 (comment)
@ZhangHuiGui if you're dealing with bugs/failures in tests, it might be because of this |
b3fa84d
to
1298642
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before merge I will do a more detailed review (I don't think I will find issues).
@jorisvandenbossche what you think about the Python changes?
@@ -2747,69 +2824,8 @@ cdef class ListViewArray(Array): | |||
""" | |||
return pyarrow_wrap_array((<CListViewArray*> self.ap).sizes()) | |||
|
|||
def flatten(self, memory_pool=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pyarrow's mainly changes should be here. [Large]ListViewArray
does not seem to need to maintain an independent flatten interface separately, and can directly use the one provided by BaseListArray
.
b73fe44
to
d623dce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please accept the patch suggestions containing typos and grammar corrections.
After these changes, I will get a Python person to give a quick look at the Python parts, then I will merge.
…function 2. Support [Large]ListView for some kernel functions: list_flatten,list_value_length, list_element 3. Support recursive flatten for pyarrow bindinds and simplify [Large]ListView's pyarrow bindings 4. Refactor vector_nested_test.cc for better support [Large]ListView types.
8d1c0f3
to
ce9947a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (from python perspective)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I took a look through the Python changes, which seem reasonable except for possibly an explicit test to make sure the value of recursive
makes it all the way to C++.
I don't have much experience with the class hierarchy of Arrays and so I can't speak to that change (but it seems like a good idea to only maintain one version of that documentation/interface).
@@ -2876,6 +2879,7 @@ def test_fixed_size_list_array_flatten(): | |||
assert arr0.type.equals(typ0) | |||
assert arr1.flatten().equals(arr0) | |||
assert arr2.flatten().flatten().equals(arr0) | |||
assert arr2.flatten(True).equals(arr0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there also be a test here where flatten(True)
and flatten(False)
return different things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the recursive
parameter defaults to False
in the flatten interface, but some of the previous test cases do not include this test. We should add such a test case here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see: assert arr2.flatten(True).equals(arr0)
is the case that I had in mind and it was already added (I just missed the distinction between arr1
and arr0
). The case you added helps make that distinction clearer...thank you!
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 5e986be. There was 1 benchmark result with an error:
There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 11 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…ist_flatten kernel function and pyarrow bindings (apache#41295) ### Rationale for this change Expose recursive flatten for logical lists on list_flatten kernel function and pyarrow bindings. ### What changes are included in this PR? 1. Expose recursive flatten for logical lists on `list_flatten` kernel function 2. Support [Large]ListView for some kernel functions: `list_flatten`,`list_value_length`, `list_element` 3. Support recursive flatten for pyarrow bindinds and simplify [Large]ListView's pyarrow bindings 4. Refactor vector_nested_test.cc for better support [Large]ListView types. ### Are these changes tested? Yes ### Are there any user-facing changes? Yes. 1. Some kernel functions like: list_flatten, list_value_length, list_element would support [Large]ListView types 2. `list_flatten` and related pyarrow bindings could support flatten recursively with an ListFlattenOptions. * GitHub Issue: apache#41183 Lead-authored-by: ZhangHuiGui <hugo.zhang@openpie.com> Co-authored-by: ZhangHuiGui <2689496754@qq.com> Signed-off-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
…ist_flatten kernel function and pyarrow bindings (apache#41295) ### Rationale for this change Expose recursive flatten for logical lists on list_flatten kernel function and pyarrow bindings. ### What changes are included in this PR? 1. Expose recursive flatten for logical lists on `list_flatten` kernel function 2. Support [Large]ListView for some kernel functions: `list_flatten`,`list_value_length`, `list_element` 3. Support recursive flatten for pyarrow bindinds and simplify [Large]ListView's pyarrow bindings 4. Refactor vector_nested_test.cc for better support [Large]ListView types. ### Are these changes tested? Yes ### Are there any user-facing changes? Yes. 1. Some kernel functions like: list_flatten, list_value_length, list_element would support [Large]ListView types 2. `list_flatten` and related pyarrow bindings could support flatten recursively with an ListFlattenOptions. * GitHub Issue: apache#41183 Lead-authored-by: ZhangHuiGui <hugo.zhang@openpie.com> Co-authored-by: ZhangHuiGui <2689496754@qq.com> Signed-off-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Rationale for this change
Expose recursive flatten for logical lists on list_flatten kernel function and pyarrow bindings.
What changes are included in this PR?
list_flatten
kernel functionlist_flatten
,list_value_length
,list_element
Are these changes tested?
Yes
Are there any user-facing changes?
Yes.
list_flatten
and related pyarrow bindings could support flatten recursively with an ListFlattenOptions.