-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] Use PyCapsule for communicating C Data Interface pointers at the Python level #34031
Comments
It would also be nice to do this in the ADBC libraries (which currently use a handwritten Python wrapper for this purpose) |
Yes, certainly, if we decide to go this way, we should support it in the places we control that interacts with the C Data Interface like ADBC (we should also check with other produces/consumers if this is possible to use) |
This doesn't seem necessary in our case, as the consumer would typically move the C struct contents. |
But can a consumer move the C struct content multiple times (and potentially calling the release callback multiple times?) The other reason for renaming in the case of DLPack is that it checks for this in the capsule deleter, to only call the release callback of the struct in case it was not consumed/renamed (for example cupy's implementation). But that's in the end the same question I assume, because it also depends on whether it is OK to call the release callback multiple times. |
Ah, I suppose this is handled by setting the release callback to NULL when moving the C struct content. And then the capsule deleter should check for that (instead of checking for it being renamed) ? |
Indeed, you don't call the release callback when moving the structure:
|
I think this would still be useful. |
Yes, we would still keep it, but just using a different check to see when it should be called (checking if the release callback is not null, instead of the changed capsule name, as dlpack does and as I mentioned in the top post) |
…cating C Data Interface pointers
I created a PR on the adbc (apache/arrow-adbc#702) and pyarrow (#35739) side as a small experiment with this. |
…cating C Data Interface pointers
…cating C Data Interface pointers
…cating C Data Interface pointers
…cating C Data Interface pointers
Hi, I am creating the RC0 right now. I can add this in future RCs if they are created. |
…cating C Data Interface pointers
### Rationale for this change ### What changes are included in this PR? * A new specification for Arrow PyCapsules and related dunder methods * Implementing the dunder methods for `DataType`, `Field`, `Schema`, `Array`, `RecordBatch`, `Table`, and `RecordBatchReader`. ### Are these changes tested? Yes, I've added various roundtrip tests for each of the types. ### Are there any user-facing changes? This introduces some new APIs and documents them. * Closes: #34031 * Closes: #35531 Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
### Rationale for this change ### What changes are included in this PR? * A new specification for Arrow PyCapsules and related dunder methods * Implementing the dunder methods for `DataType`, `Field`, `Schema`, `Array`, `RecordBatch`, `Table`, and `RecordBatchReader`. ### Are these changes tested? Yes, I've added various roundtrip tests for each of the types. ### Are there any user-facing changes? This introduces some new APIs and documents them. * Closes: #34031 * Closes: #35531 Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…37797) ### Rationale for this change ### What changes are included in this PR? * A new specification for Arrow PyCapsules and related dunder methods * Implementing the dunder methods for `DataType`, `Field`, `Schema`, `Array`, `RecordBatch`, `Table`, and `RecordBatchReader`. ### Are these changes tested? Yes, I've added various roundtrip tests for each of the types. ### Are there any user-facing changes? This introduces some new APIs and documents them. * Closes: apache#34031 * Closes: apache#35531 Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…37797) ### Rationale for this change ### What changes are included in this PR? * A new specification for Arrow PyCapsules and related dunder methods * Implementing the dunder methods for `DataType`, `Field`, `Schema`, `Array`, `RecordBatch`, `Table`, and `RecordBatchReader`. ### Are these changes tested? Yes, I've added various roundtrip tests for each of the types. ### Are there any user-facing changes? This introduces some new APIs and documents them. * Closes: apache#34031 * Closes: apache#35531 Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…37797) ### Rationale for this change ### What changes are included in this PR? * A new specification for Arrow PyCapsules and related dunder methods * Implementing the dunder methods for `DataType`, `Field`, `Schema`, `Array`, `RecordBatch`, `Table`, and `RecordBatchReader`. ### Are these changes tested? Yes, I've added various roundtrip tests for each of the types. ### Are there any user-facing changes? This introduces some new APIs and documents them. * Closes: apache#34031 * Closes: apache#35531 Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
Describe the enhancement requested
Currently we have the various
_export_to_c
/_import_from_c
methods for working with the Arrow C Interface that expect integers as arguments for the struct pointers. We could also usePyCapsule
objects for this instead of integers (or (certainly initially) in addition to), inspired by a similar interface from DLPack.DLPack provides a stable in-memory data structure that allows exchanging array data between frameworks. It essentially plays the same role for ndarrays (tensors) as what the Arrow C interface does for arrow-compatible data (columnar data). It also defines a stable C ABI with a similar C struct definitions (header file).
In the DLPack project, apart from the stable C ABI struct, they also defined a python specification (including a method name to access the protocol, i.e.
__dlpack__
), see https://dmlc.github.io/dlpack/latest/python_spec.html#implementation for the details. And for that specification, they return not a raw pointer, but use aPyCapsule
object (a python object that represents an "opaque value", such as a pointer, and that can used by C extensions to pass such values through Python code to other C code, https://docs.python.org/3/c-api/capsule.html).Some details based on their implementation:
The proposal would be to mimic what DLPack does in places where we now expect or return a integer pointer (the interface needs to be different as the current
_export_to_c
, as we would now return a capsule, instead of having the return pointer as a parameter of the method).Component(s)
Python
The text was updated successfully, but these errors were encountered: