[Python] Current assertion of CPU-accessible data in Array methods is specific to CPU device type #43511

jorisvandenbossche · 2024-08-01T12:10:45Z

For #41665 (implemented for Array in #42112 / #42113), we currently use the following assertion to check if the data is on CPU (and thus supports the operation in question that access the data's address):

arrow/python/pyarrow/array.pxi

Lines 2035 to 2037 in d4d92e4

    
           cdef void _assert_cpu(self) except *: 
        
               if self.sp_array.get().device_type() != CDeviceAllocationType_kCPU: 
        
                   raise NotImplementedError("Implemented only for data on CPU device")

This checks explicitly for the CPU device allocation type.
However, this means that for example data with a CUDA_HOST device type, which is actually accessible from the CPU, will trigger this error:

import numpy as np
import pyarrow as pa
from pyarrow import cuda

# create Array with CudaHost buffer
buf = cuda.new_host_buffer(5*8)
np.frombuffer(buf, dtype=np.int64)[:] = range(5)
arr = pa.Array.from_buffers(pa.int64(), size, [None, buf])

# inspect the array
>>> arr
<pyarrow.lib.Int64Array object at 0x7f24b6e02e00>
[
  0,
  1,
  2,
  3,
  4
]
>>> arr.device_type
<DeviceAllocationType.CUDA_HOST: 3>

# calling a method that checks _assert_cpu errors
>>> arr.sum()
...
NotImplementedError: Implemented only for data on CPU device

# but the underlying buffer itself "is_cpu"
>>> arr.buffers()[1]
<pyarrow.Buffer address=0x7f24c1600400 size=80 is_cpu=True is_mutable=True>
>>> arr.buffers()[1].is_cpu
True
>>> arr.buffers()[1].device_type
<DeviceAllocationType.CUDA_HOST: 3>

At the buffer level we have this is_cpu attribute available, but currently on the Array level we only have device_type(). We could add CUDA_HOST device allocation type explicitly to the check above, but ideally we would use something more general?

(cc @danepitkin)

The text was updated successfully, but these errors were encountered:

jorisvandenbossche added Type: enhancement Component: Python Component: GPU labels Aug 1, 2024

felipecrv mentioned this issue Aug 2, 2024

[C++] Compute functions should fail gracefully when given non-CPU resident Arrays #43541

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python] Current assertion of CPU-accessible data in Array methods is specific to CPU device type #43511

[Python] Current assertion of CPU-accessible data in Array methods is specific to CPU device type #43511

jorisvandenbossche commented Aug 1, 2024

[Python] Current assertion of CPU-accessible data in Array methods is specific to CPU device type #43511

[Python] Current assertion of CPU-accessible data in Array methods is specific to CPU device type #43511

Comments

jorisvandenbossche commented Aug 1, 2024