Skip to content

Support for memoryview-safe variable length arrays? #673

Open
@shz9

Description

@shz9

Hi there,

I'm using Zarr to store ragged arrays in a fashion that's similar to what's outlined in the documentation:

import numcodecs, zarr, numpy as np
z = zarr.empty(4, dtype=object, object_codec=numcodecs.VLenArray(int))
z[0] = np.array([1, 3, 5])
z[1] = np.array([4])
z[2] = np.array([7, 9, 14])

In my case, I need to retrieve those arrays and then process them in a cython function:

cpdef process_ragged_arrays(int[:] r_arr):
    ...

for i in range(z.shape[0]):
    process_ragged_arrays(z[i])

However, here I get the following error message: ValueError: buffer source array is read-only. This error has already been discussed and tackled elsewhere (e.g. Dask#1978, scikit-allel#208), typically by running the array through a function like this (h/t @alimanfoo):

def memoryview_safe(x):
    """Make array safe to run in a Cython memoryview-based kernel. These
    kernels typically break down with the error ``ValueError: buffer source
    array is read-only`` when running in dask distributed.
    See Also
    --------
    https://github.com/dask/distributed/issues/1978
    https://github.com/cggh/scikit-allel/issues/206
    """
    if not x.flags.writeable:
        if not x.flags.owndata:
            x = x.copy(order='A')
        x.setflags(write=True)
    return x

My question is: Is it possible to make ragged arrays memoryview-safe natively? I can definitely run memoryview_safe on each array I retrieve, but it will incur an overhead that I would like to avoid in my program.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions