Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove __array__ #69

Merged
merged 6 commits into from
Nov 2, 2024
Merged

Remove __array__ #69

merged 6 commits into from
Nov 2, 2024

Conversation

asmeurer
Copy link
Member

This makes it raise an exception, since it isn't supported by the standard (if
we just leave it unimplemented, then np.asarray() returns an object dtype
array, which is not good).

There is one issue here from the test suite, which is that this breaks the
logic in asarray() for converting lists of array_api_strict 0-D arrays. I'm
not yet sure what to do about that.

Fixes #67.

This is built on #68, which can and should be merged separately (it isn't controversial).

This makes it raise an exception, since it isn't supported by the standard (if
we just leave it unimplemented, then np.asarray() returns an object dtype
array, which is not good).

There is one issue here from the test suite, which is that this breaks the
logic in asarray() for converting lists of array_api_strict 0-D arrays. I'm
not yet sure what to do about that.

Fixes data-apis#67.
@betatim
Copy link
Member

betatim commented Oct 21, 2024

In scikit-learn's tests we convert array API arrays to Numpy arrays in order to compare their values to some reference array. How would we do that if we remove __array__ here? Other libraries have .tonumpy() and the like, but I think array-api-strict doesn't? Should we add something like that as part of this removal?

@asmeurer
Copy link
Member Author

Testing is a valid use-case. You could use dlpack if you wanted to stay within the standard. Although I'm still not clear if this is a good idea or not.

@betatim
Copy link
Member

betatim commented Oct 22, 2024

Good point. We already have custom code to convert from the various array libraries to Numpy. Using

x = xp.asarray([1,2,3], device=...)
np.from_dlpack(x)

would maybe simplify that code and remove the need for __array__.

Copy link

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I'm still not clear if this is a good idea or not.

I think this is a good idea for array-api-strict. Libraries that need to do any array conversion should only depend on __dlpack__.

(Although this PR is breaking tests in this repo)

@asmeurer
Copy link
Member Author

To clarify a note from the OP: this has stopped working

>>> import array_api_strict as xp
>>> xp.asarray([[xp.asarray(1), xp.asarray(2)]])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/aaronmeurer/Documents/array-api-strict/array_api_strict/_creation_functions.py", line 97, in asarray
    res = np.array(obj, dtype=_np_dtype, copy=copy)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/aaronmeurer/Documents/array-api-strict/array_api_strict/_array_object.py", line 138, in __array__
    raise ValueError("Conversion from an array_api_strict array to a NumPy ndarray is not supported")
ValueError: Conversion from an array_api_strict array to a NumPy ndarray is not supported

because it was implicitly relying on np.asarray() calling __array__ on nested array_api_strict arrays.

@asmeurer asmeurer enabled auto-merge November 1, 2024 22:01
@asmeurer asmeurer merged commit b9e64e5 into data-apis:main Nov 2, 2024
21 checks passed
@asmeurer
Copy link
Member Author

asmeurer commented Nov 6, 2024

Thinking about dlpack, should we be doing anything with the device emulation in __dlpack__? https://data-apis.org/array-api/latest/API_specification/generated/array_api.array.__dlpack__.html#dlpack Maybe we should disable __dlpack__ when the device isn't the default CPU_DEVICE? @betatim

@betatim
Copy link
Member

betatim commented Nov 7, 2024

I thought dlpack now allows you to move arrays from one device to another? For example to support the common case of "i would like to convert this array from library foo to a numpy array". Is that right? If yes, it could be cool to support this for the "not CPU devices" here. If only because (in scikit-learn) we convert things back to Numpy arrays during tests.

@rgommers
Copy link
Member

rgommers commented Nov 8, 2024

#67 (comment) points out that this change broke a lot of code, and that there is no guidance on what to replace xp.asarray with.

In practice, using from_dlpack isn't a 1:1 replacement for SciPy et al. because that won't accept sequences.

Libraries that need to do any array conversion should only depend on __dlpack__.

I think it's probably a good idea in principle to nudge libraries in this direction. But we need to work out the replacement pattern here first. Unless someone has a ready-made suggestion that can be easily implemented in SciPy, scikit-learn, Xarray (pydata/xarray#9750) & co, we may want to yank the 2.1.1 and 2.1.2 releases. Actually probably a good idea to do so anyway, because this should not land in a bugfix release (xref gh-89).

@lucascolley
Copy link
Contributor

the short-term fix for SciPy shouldn't be so bad, unless I'm missing something? Something like

def _asarray(x, xp_in=None, xp_out=None):
	xp_in = array_namespace(x) if xp_in is None else xp_in
	xp_out = array_namespace(x) if xp_out is None else xp_out
	x = xp_in.asarray(x)
	return xp_out.from_dlpack(x)

@asmeurer
Copy link
Member Author

asmeurer commented Nov 8, 2024

Just to be clear, this change and the change to pin NumPy are independent of one another (it appears that xarray had issues with the latter).

I'm happy to revert (not yank) one or both changes if they are causing issues upstream. The __array__ thing I was unsure about, but at the same time, it is the case that __array__ is not in the standard, so any issues caused by this could also be caused by another array API library that doesn't implement __array__. If it's hard to do things, maybe the standard should be updated accordingly?

Regarding depending on NumPy 2.0, this is primarily a maintenance thing, since a good chunk of the code in array-api-strict is just working around array API non-conformance in NumPy 1.26, and can be removed if we only support 2.0. I was under the impression it wouldn't be a huge issue since array-api-strict is a testing only library, but if it's still too soon to do this, we can wait a bit.

@rgommers
Copy link
Member

rgommers commented Nov 8, 2024

I'd for sure revert this change and release 2.1.3, to unbreak SciPy. Then we have actual time to investigate the fix, hopefully some variant like @lucascolley suggested above.

it is the case that __array__ is not in the standard, so any issues caused by this could also be caused by another array API library that doesn't implement __array__. If it's hard to do things, maybe the standard should be updated accordingly?

Maybe it does need a tweak in the standard, maybe not - but let's think about it a bit more carefully first before reintroducing the removal of __array__ for a new feature release.

@mdhaber
Copy link

mdhaber commented Nov 18, 2024

In principle, it looks like it should be about as simple as proposed above. In practice (scipy/scipy#21890), I'm finding that library support for from_dlpack is not quite ready to rely on. So unless I'm misunderstanding, it looks like we'd have to switch between from_dlpack and relying on xp.asarray working depending on the library.

import numpy as np  # numpy,  2.1.1, py312h49bc9c5_0, conda-forge
import cupy as cp  # cupy, 13.3.0, py312h584ea29_0, conda-forge (tested on Windows)
import array_api_strict as strict  # array-api-strict, 2.2, pyhd8ed1ab_0, conda-forge (tested on Windows)
import torch # pytorch, 2.4.1, cpu_generic_py313h43ebb1a_2, conda-forge (tested on Mac)
import jax.numpy as jnp  # jax, 0.4.34, pyhd8ed1ab_0, conda-forge (tested on Mac)
from scipy._lib._array_api import array_namespace  # SciPy main

xps = [cp, strict, torch, jnp]
a0 = [1, 2, 3]

# Test roundtrip use of `from_dlpack` between non-NumPy and NumPy
# (I had done all pairwise transfers originally, but SciPy only needs this roundtrip)
for xp in xps:
    a = xp.asarray(a0)  # a non-NumPy array
    xp_compat = array_namespace(a)

    try:  # transfer to NumPy
        a_np = np.from_dlpack(a, device='cpu')
        print(f'SUCCESS: {xp.__name__} to {np.__name__}')
    except Exception as e:
        a_np = np.asarray(a0)
        print(f'FAILURE: {xp.__name__} to {np.__name__}: {str(e)}')

    try:  # transfer back to original device
        a_xp = xp_compat.from_dlpack(a_np, device=a.device)
        print(f'SUCCESS: {np.__name__} to {xp.__name__}')
    except Exception as e:
        print(f'FAILURE: {np.__name__} to {xp.__name__}: {str(e)}')

# FAILURE: cupy to scipy._lib.array_api_compat.numpy: __dlpack__() got an unexpected keyword argument 'dl_device'
# FAILURE: scipy._lib.array_api_compat.numpy to cupy: cupy._core.dlpack.from_dlpack() takes no keyword arguments
# SUCCESS: array_api_strict to scipy._lib.array_api_compat.numpy
# SUCCESS: scipy._lib.array_api_compat.numpy to array_api_strict
# FAILURE: torch to numpy: Tensor.__dlpack__() got an unexpected keyword argument 'dl_device'
# FAILURE: numpy to torch: from_dlpack() got an unexpected keyword argument 'device'
# SUCCESS: jax.numpy to numpy
# FAILURE: numpy to jax.numpy: Cannot export readonly array since signalling readonly is unsupported by DLPack (supported by newer DLPack version).

Without specifying device:

FAILURE: cupy to numpy: Unsupported device in DLTensor.
FAILURE: numpy to cupy: CPU arrays cannot be directly imported to CuPy. Use `cupy.array(numpy.from_dlpack(input))` instead.
SUCCESS: array_api_strict to numpy
SUCCESS: numpy to array_api_strict
SUCCESS: torch to numpy
FAILURE: numpy to torch: Cannot export readonly array since signalling readonly is unsupported by DLPack (supported by newer DLPack version).
SUCCESS: jax.numpy to numpy
FAILURE: numpy to jax.numpy: Cannot export readonly array since signalling readonly is unsupported by DLPack (supported by newer DLPack version).

The last failure goes away if a_np is writeable (e.g. by copying), but this brings up another issue - that from_dlpack will typically return a read-only array. I've seen errors because of this, and trying to set the write flag raises a ValueError: cannot set WRITEABLE flag to True of this array. I'll investigate this a little more to see why we are trying to write to it.

@asmeurer
Copy link
Member Author

I'm not sure what's going on with the read-only stuff. It may be prudent to push the discussion about that to either numpy or the array-api repo if it is an issue so people who know more about it can chime in about whether something needs to be changed there.

@rgommers
Copy link
Member

I'm finding that library support for from_dlpack is not quite ready to rely on. So unless I'm misunderstanding, it looks like we'd have to switch between from_dlpack and relying on xp.asarray working depending on the library.

@mdhaber and I had a chat about this, but to summarize for other readers: from_dlpack without the device keyword is fine to use, it's broadly supported. That also serves the same purpose as __array__ and asarray (changing to another library without device transfers).

The device= keyword is new in the latest version of the standard and requires DLPack 1.0, which doesn't yet have broad support. It provides the new capability to switch to another library's array type while doing a device transfer. This isn't related to this issue directly and should not be needed for library code, although it may certainly be useful for both end user code and for testing purposes in libraries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should we be implementing __array__?
6 participants