Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Current assertion of CPU-accessible data in Array methods is specific to CPU device type #43511

Open
jorisvandenbossche opened this issue Aug 1, 2024 · 0 comments

Comments

@jorisvandenbossche
Copy link
Member

For #41665 (implemented for Array in #42112 / #42113), we currently use the following assertion to check if the data is on CPU (and thus supports the operation in question that access the data's address):

arrow/python/pyarrow/array.pxi

Lines 2035 to 2037 in d4d92e4

cdef void _assert_cpu(self) except *:
if self.sp_array.get().device_type() != CDeviceAllocationType_kCPU:
raise NotImplementedError("Implemented only for data on CPU device")

This checks explicitly for the CPU device allocation type.
However, this means that for example data with a CUDA_HOST device type, which is actually accessible from the CPU, will trigger this error:

import numpy as np
import pyarrow as pa
from pyarrow import cuda

# create Array with CudaHost buffer
buf = cuda.new_host_buffer(5*8)
np.frombuffer(buf, dtype=np.int64)[:] = range(5)
arr = pa.Array.from_buffers(pa.int64(), size, [None, buf])

# inspect the array
>>> arr
<pyarrow.lib.Int64Array object at 0x7f24b6e02e00>
[
  0,
  1,
  2,
  3,
  4
]
>>> arr.device_type
<DeviceAllocationType.CUDA_HOST: 3>

# calling a method that checks _assert_cpu errors
>>> arr.sum()
...
NotImplementedError: Implemented only for data on CPU device

# but the underlying buffer itself "is_cpu"
>>> arr.buffers()[1]
<pyarrow.Buffer address=0x7f24c1600400 size=80 is_cpu=True is_mutable=True>
>>> arr.buffers()[1].is_cpu
True
>>> arr.buffers()[1].device_type
<DeviceAllocationType.CUDA_HOST: 3>

At the buffer level we have this is_cpu attribute available, but currently on the Array level we only have device_type(). We could add CUDA_HOST device allocation type explicitly to the check above, but ideally we would use something more general?

(cc @danepitkin)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant