Skip to content

Conversation

@tadeja
Copy link
Contributor

@tadeja tadeja commented Nov 26, 2025

Rationale for this change

To close or discuss #48167
inverse_permutation and scatter functions got implemented via #44393, PR #44394.

What changes are included in this PR?

Python tests for scatter, inverse_permutation kernels and bindings for InversePermutationOptions and ScatterOptions.

Are these changes tested?

Yes, tests added in test_compute.py.

Are there any user-facing changes?

Bindings for InversePermutationOptions and ScatterOptions are added.

@github-actions
Copy link

⚠️ GitHub issue #48167 has been automatically assigned in GitHub to PR creator.

@tadeja
Copy link
Contributor Author

tadeja commented Nov 26, 2025

The background of inverse_permutation and scatter functions is nicely explained under this umbrella issue

Does it make sense to add Python wrappers/bindings for InversePermutationOptions and ScatterOptions already at this point in time? (Or at all?) What would you say, @felipecrv, @zanmato1984 ?

Currently both InversePermutationOptions and ScatterOptions do not seem to get accepted by inverse_permutation and scatter as demonstrated in added tests of test_compute.py.

@github-actions
Copy link

⚠️ GitHub issue #48167 has been automatically assigned in GitHub to PR creator.

1 similar comment
@github-actions
Copy link

⚠️ GitHub issue #48167 has been automatically assigned in GitHub to PR creator.

@zanmato1984
Copy link
Contributor

Hi @tadeja , thanks for working on this. Appreciate it. I'm glad to see we are adding python bindings for scatter and inverse_permutation. To your questions:

Does it make sense to add Python wrappers/bindings for InversePermutationOptions and ScatterOptions already at this point in time? (Or at all?) What would you say, @felipecrv, @zanmato1984 ?

I think it does make sense to have them.

Currently both InversePermutationOptions and ScatterOptions do not seem to get accepted by inverse_permutation and scatter as demonstrated in added tests of test_compute.py.

I'm not sure what you mean. The underlying C++ APIs will accept those options. So I would expect no problem for their python bindings. Are you saying that they are not working?

@tadeja tadeja force-pushed the bindings_for_inverse_permutation_and_scatter branch from a3b6fb1 to 47e3e8c Compare November 27, 2025 22:05
@tadeja
Copy link
Contributor Author

tadeja commented Nov 28, 2025

Thank you for the input, @zanmato1984, that helped.

I'm not sure what you mean. The underlying C++ APIs will accept those options. So I would expect no problem for their python bindings. Are you saying that they are not working?

It looks like Python bindings got fixed by adding option class names to FunctionDoc in vector_swizzle.cc.

Now test_option_class_equality is failing by default with
pyarrow.lib.ArrowInvalid: Could not serialize field output_type of options type InversePermutationOptions: shared_ptr<DataType> is nullptr
unless I pass output_type there, e.g. pc.InversePermutationOptions(output_type=pa.int32()).

If output_type is NULLPTR in C++ (which happens if output_type is None in Python) then InversePermutationOptions can't be serialized.
Any advice on the approach for fixing this? (Should Python users always specify output_type, or perhaps output_type be optional in C++? Or is it fine to leave it as it is?)

See example below:

import pyarrow.compute as pc
pc.InversePermutationOptions()

>>> InversePermutationOptions(max_index=-1, output_type=<NULLPTR>)
import pyarrow.compute as pc
pc.InversePermutationOptions().serialize()

>>> Traceback (most recent call last):
  File "<python-input-3>", line 1, in <module>
    pc.InversePermutationOptions().serialize()
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "pyarrow/_compute.pyx", line 623, in pyarrow._compute.FunctionOptions.serialize
    shared_ptr[CBuffer] c_buf = GetResultValue(res)
  File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
    return check_status(status)
  File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
    raise convert_status(status)
pyarrow.lib.ArrowInvalid: Could not serialize field output_type of options type InversePermutationOptions: shared_ptr<DataType> is nullptr

cc @pitrou

@tadeja tadeja marked this pull request as ready for review November 28, 2025 17:31
@zanmato1984
Copy link
Contributor

Thanks @tadeja for the further explanation! I now see the problem. Let me check the code and get back to you soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants