Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-41183: [C++][Python] Expose recursive flatten for lists on list_flatten kernel function and pyarrow bindings #41295

Conversation

ZhangHuiGui
Copy link
Collaborator

@ZhangHuiGui ZhangHuiGui commented Apr 18, 2024

Rationale for this change

Expose recursive flatten for logical lists on list_flatten kernel function and pyarrow bindings.

What changes are included in this PR?

  1. Expose recursive flatten for logical lists on list_flatten kernel function
  2. Support [Large]ListView for some kernel functions: list_flatten,list_value_length, list_element
  3. Support recursive flatten for pyarrow bindinds and simplify [Large]ListView's pyarrow bindings
  4. Refactor vector_nested_test.cc for better support [Large]ListView types.

Are these changes tested?

Yes

Are there any user-facing changes?

Yes.

  1. Some kernel functions like: list_flatten, list_value_length, list_element would support [Large]ListView types
  2. list_flatten and related pyarrow bindings could support flatten recursively with an ListFlattenOptions.

Copy link

⚠️ GitHub issue #41183 has been automatically assigned in GitHub to PR creator.

@ZhangHuiGui
Copy link
Collaborator Author

@felipecrv @mapleFU PTAL this if you're interested!

@ZhangHuiGui ZhangHuiGui force-pushed the expose_recursive_flatten_on_list_related_kernel branch from 8f01a3a to c400b46 Compare April 18, 2024 15:09
cpp/src/arrow/compute/api_vector.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/api_vector.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/api_vector.h Outdated Show resolved Hide resolved
@@ -57,8 +57,14 @@ Result<TypeHolder> LastType(KernelContext*, const std::vector<TypeHolder>& types
}

Result<TypeHolder> ListValuesType(KernelContext*, const std::vector<TypeHolder>& args) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add a bool recursive parameter (without default value) and skip the for loop if it's false to match existing usage.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here seems correct for the recursive = false condition except one more value_kind check on is_list(value_kind) || is_list_view(value_kind).

And it's the output resolver(a callback to identify the kernel functions's output type) which called only once in expression bind or CallFunction's, and have little impact on kernel path. So maybe the logic here could be retained?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. I started another thread and asked for clarification #41295 (comment)

cpp/src/arrow/compute/kernels/scalar_nested.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/scalar_nested.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/vector_nested.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/vector_nested.cc Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Apr 18, 2024
@felipecrv
Copy link
Contributor

@ZhangHuiGui if you're dealing with bugs/failures in tests, it might be because of this
#41295 (comment)

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Apr 19, 2024
@ZhangHuiGui ZhangHuiGui force-pushed the expose_recursive_flatten_on_list_related_kernel branch from b3fa84d to 1298642 Compare April 19, 2024 10:12
Copy link
Contributor

@felipecrv felipecrv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before merge I will do a more detailed review (I don't think I will find issues).

@jorisvandenbossche what you think about the Python changes?

cpp/src/arrow/compute/api_vector.h Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/vector_nested.cc Outdated Show resolved Hide resolved
python/pyarrow/_compute.pyx Outdated Show resolved Hide resolved
python/pyarrow/array.pxi Outdated Show resolved Hide resolved
python/pyarrow/array.pxi Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Apr 19, 2024
@@ -2747,69 +2824,8 @@ cdef class ListViewArray(Array):
"""
return pyarrow_wrap_array((<CListViewArray*> self.ap).sizes())

def flatten(self, memory_pool=None):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pyarrow's mainly changes should be here. [Large]ListViewArray does not seem to need to maintain an independent flatten interface separately, and can directly use the one provided by BaseListArray.

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Apr 20, 2024
@ZhangHuiGui ZhangHuiGui force-pushed the expose_recursive_flatten_on_list_related_kernel branch 2 times, most recently from b73fe44 to d623dce Compare April 20, 2024 07:39
Copy link
Contributor

@felipecrv felipecrv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please accept the patch suggestions containing typos and grammar corrections.

After these changes, I will get a Python person to give a quick look at the Python parts, then I will merge.

cpp/src/arrow/compute/api_vector.h Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/vector_nested.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/vector_nested_test.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/vector_nested_test.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/vector_nested_test.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/vector_nested_test.cc Outdated Show resolved Hide resolved
python/pyarrow/_compute.pyx Outdated Show resolved Hide resolved
python/pyarrow/_compute.pyx Outdated Show resolved Hide resolved
python/pyarrow/array.pxi Outdated Show resolved Hide resolved
python/pyarrow/array.pxi Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Apr 29, 2024
ZhangHuiGui and others added 6 commits April 30, 2024 03:54
…function

2. Support [Large]ListView for some kernel functions: list_flatten,list_value_length, list_element
3. Support recursive flatten for pyarrow bindinds and simplify [Large]ListView's pyarrow bindings
4. Refactor vector_nested_test.cc for better support [Large]ListView types.
@ZhangHuiGui ZhangHuiGui force-pushed the expose_recursive_flatten_on_list_related_kernel branch from 8d1c0f3 to ce9947a Compare April 29, 2024 19:55
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Apr 29, 2024
@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Apr 29, 2024
Copy link
Member

@danepitkin danepitkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (from python perspective)

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I took a look through the Python changes, which seem reasonable except for possibly an explicit test to make sure the value of recursive makes it all the way to C++.

I don't have much experience with the class hierarchy of Arrays and so I can't speak to that change (but it seems like a good idea to only maintain one version of that documentation/interface).

@@ -2876,6 +2879,7 @@ def test_fixed_size_list_array_flatten():
assert arr0.type.equals(typ0)
assert arr1.flatten().equals(arr0)
assert arr2.flatten().flatten().equals(arr0)
assert arr2.flatten(True).equals(arr0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there also be a test here where flatten(True) and flatten(False) return different things?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the recursive parameter defaults to False in the flatten interface, but some of the previous test cases do not include this test. We should add such a test case here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see: assert arr2.flatten(True).equals(arr0) is the case that I had in mind and it was already added (I just missed the distinction between arr1 and arr0). The case you added helps make that distinction clearer...thank you!

@github-actions github-actions bot added awaiting review Awaiting review awaiting changes Awaiting changes awaiting committer review Awaiting committer review and removed awaiting merge Awaiting merge awaiting review Awaiting review awaiting changes Awaiting changes labels Apr 30, 2024
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Apr 30, 2024
@felipecrv felipecrv merged commit 5e986be into apache:main Apr 30, 2024
37 of 38 checks passed
@felipecrv felipecrv removed the awaiting changes Awaiting changes label Apr 30, 2024
Copy link

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 5e986be.

There was 1 benchmark result with an error:

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 11 possible false positives for unstable benchmarks that are known to sometimes produce them.

tolleybot pushed a commit to tmct/arrow that referenced this pull request May 2, 2024
…ist_flatten kernel function and pyarrow bindings (apache#41295)

### Rationale for this change
Expose recursive flatten for logical lists on list_flatten kernel function and pyarrow bindings.

### What changes are included in this PR?
1. Expose recursive flatten for logical lists on `list_flatten` kernel function
2. Support [Large]ListView for some kernel functions: `list_flatten`,`list_value_length`, `list_element`
3. Support recursive flatten for pyarrow bindinds and simplify [Large]ListView's pyarrow bindings
4. Refactor vector_nested_test.cc for better support [Large]ListView types.

### Are these changes tested?
Yes

### Are there any user-facing changes?
Yes.
1. Some kernel functions like: list_flatten, list_value_length, list_element would support [Large]ListView types
2. `list_flatten` and related pyarrow bindings could support flatten recursively with an ListFlattenOptions.

* GitHub Issue: apache#41183

Lead-authored-by: ZhangHuiGui <hugo.zhang@openpie.com>
Co-authored-by: ZhangHuiGui <2689496754@qq.com>
Signed-off-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
vibhatha pushed a commit to vibhatha/arrow that referenced this pull request May 25, 2024
…ist_flatten kernel function and pyarrow bindings (apache#41295)

### Rationale for this change
Expose recursive flatten for logical lists on list_flatten kernel function and pyarrow bindings.

### What changes are included in this PR?
1. Expose recursive flatten for logical lists on `list_flatten` kernel function
2. Support [Large]ListView for some kernel functions: `list_flatten`,`list_value_length`, `list_element`
3. Support recursive flatten for pyarrow bindinds and simplify [Large]ListView's pyarrow bindings
4. Refactor vector_nested_test.cc for better support [Large]ListView types.

### Are these changes tested?
Yes

### Are there any user-facing changes?
Yes.
1. Some kernel functions like: list_flatten, list_value_length, list_element would support [Large]ListView types
2. `list_flatten` and related pyarrow bindings could support flatten recursively with an ListFlattenOptions.

* GitHub Issue: apache#41183

Lead-authored-by: ZhangHuiGui <hugo.zhang@openpie.com>
Co-authored-by: ZhangHuiGui <2689496754@qq.com>
Signed-off-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants