introspecting nested duck arrays #843
Replies: 3 comments 8 replies
-
If I understand you correctly, for the I may be completely misunderstanding! |
Beta Was this translation helpful? Give feedback.
-
I'll note that @shoyer brought this up as the key issue making it hard for Xarray to fully implement the standard in gh-807, and the conclusion of that discussion is that we will allow scalars more broadly in all functions (in the next revision of the standard, v2024), as long as there is still a single input argument that's an array so that the output type/device/dtype/etc. can be determined. Does that address the problem well enough from your perspective @keewis? |
Beta Was this translation helpful? Give feedback.
-
I agree with this notion. An advantage of the More generally, it seems to me that if Dask knows it is wrapping CuPy arrays, then it should be Dask's responsibility to make sure functions like As far as introspection, I would worry whether introspection APIs might limit the sorts of wrapping that can happen. |
Beta Was this translation helpful? Give feedback.
-
This is only related to the array API standard, but over the past year I've been thinking on and off about how to deal with
dask
-wrappedcupy
arrays inxarray
. While I am focusing ondask+cupy
here, I believe that would also be helpful for any other type of (possibly deeper) nested arrays like a unit-aware, masked and chunked sparse array, which involves layering 4 types: the unit-aware type, the masked array type, the chunking type, and the sparse array type.The main issue is that
cupy
itself absolutely refuses to interact with any kind ofnumpy
array, even 0d arrays. This is an issue, because most functions in the array API standard are explicitly defined in terms of arrays, which means that for scalarsxarray
will have to figure out which type to convert to. For simple arrays this would simply defer toxp.asarray
(as pointed out in the issue above by @rgommers), but as soon as we have a more deeply nested array this becomes impractical.Instead, I believe the best way to resolve this is to figure out which of the layers of array type is responsible for the actual bytes in memory (usually, this will be the innermost array layer). For this purpose I've come up with a recursive protocol (tentatively named
__array_layers__
) that is only supposed to be defined by array-wrapping libraries.Calling that protocol would return a
tuple
of type objects, one for each layer and with the outermost layer at index 0. For the example above, usingpint
,dask
,marray
andsparse
(marray
, I believe, is still experimental so this might not actually work):This would allow any library to inspect the stack of array types, and would allow
xarray
to findcupy
(and thecupy
namespace) underneath other layers of duck arrays.What I'm hoping for in opening this discussion is feedback, mostly on the protocol (the name and the mechanism used to define it), but maybe also whether you can see a way to resolve my issue without a new protocol.
cc @shoyer, @tomwhite, @TomNicholas, @rgommers
Beta Was this translation helpful? Give feedback.
All reactions