Specify shape behavior #289

kgryte · 2021-10-21T08:15:11Z

This PR

resolves gh-97 and follows the guidance discussed therein. Update: supporting unknown ranks is no longer included in this proposal (see comment).
allows x.shape to return a tuple with None elements to indicate that a dimension is unknown.
tightens guidance specifying that x.shape should return a tuple. A custom shape object is allowed so long as it behaves like a tuple.
removes __len__ from the specification. NumPy, TF, and others return the size of the first dimension when len() (which calls __len__) is invoked on an array. In Python, __len__ should return integer >=0. Given the above changes accommodating unknown shapes/dimensions, returning an integer cannot be guaranteed as the first dimension could be unknown and thus None. Accordingly, this PR elects to remove __len__ from the specification.

asmeurer · 2021-10-22T23:20:39Z

My worry about this is that shape is no longer type-safe. Any code that works with shape would need to guard against None to pass type checks, and even ignoring mypy, in practice I would expect TypeError: 'NoneType' object is not subscriptable and similar such errors would occur with a large amount of code when presented with such an array. And presumably most operations wouldn't actually produce unknown shapes, meaning such checks would often not actually be needed, although when this is and isn't the case isn't obvious.

Weren't there proposals to allow shape to be an array? Wouldn't this solve the problem for libraries that already allow representing arrays as undefined objects? This would seem more type safe since you can always index into an array, even if the resulting object is some uncomputed entity. That would also seem to allow more flexibility in terms of some part of the shape being known vs. it is either all known or completely unknown, i.e., you might know ndim or size but not the exact dimension sizes, or you might know one dimension but not another. I might be misunderstanding how graph libraries actually handle uncomputed arrays, so let me know if this is not as simple as I am making it out to be.

I know there have been some example codes people have written using array API compliant APIs. How many of those are making use of the shape, ndim, or size attributes? I know the test suite as currently written will completely fall apart if shape == None. The prospect of trying to fix it seems daunting, and I don't think I want to attempt it without an example library that actually does this to test against. The test suite may be a bit extreme as it is intentionally looking into the shapes of everything to make sure they are correct. But I feel like looking at shape is pretty common in real world code too, especially library code, which is one of the main potential consumers of the array API.

kgryte · 2021-11-04T08:41:19Z

For visibility, @shoyer @agarwal-ashish @jakirkham @alextp.

alextp · 2021-11-04T13:10:06Z

Re type safety, doesn't allowing None just mean that the type of array.shape becomes Optional[Shape] instead of just Shape? That does inform that the call site needs to check.

shoyer · 2021-11-04T16:22:45Z

TensorFlow currently alows arrays with unknown numbers of dimensions, but in my opinion the use-cases (beyond backwards compatibility) are pretty marginal. On the other hand, arrays with unknown shape are much more common (e.g., at least in TensorFlow, JAX and Dask).

So I would lean towards typing array.shape as Tuple[Optional[int], ...]].

I don't love the idea of allowing array.shape itself to return an array, because it's useful to be able to statically check for fully known dimension sizes, which allow for some types of optimizations.

Instead, we could consider a separate function like tf.shape for returning array shape as a 1D array.

kgryte · 2021-11-04T16:37:13Z

@shoyer Agreed. We've discussed adding a shape API (MXNet also has shape_array) to support returning an array, but have pushed that to the 2022 version of the standard to gather additional requirements, with the focus on nailing down x.shape expectations for 2021.

I'm certainly open to modifying the current proposal to always return a tuple (i.e., disallowing supporting unknown rank).

rgommers · 2021-11-04T16:48:27Z

TensorFlow currently alows arrays with unknown numbers of dimensions, but in my opinion the use-cases (beyond backwards compatibility) are pretty marginal. On the other hand, arrays with unknown shape are much more common (e.g., at least in TensorFlow, JAX and Dask).

So I would lean towards typing array.shape as Tuple[Optional[int], ...]].

This sounds about right to me - strikes a good balance between the needs of libraries using delayed evaluation and usability.

It's one thing to explain to users that x.shape may have None for one or more elements of the tuple, so they should check for that - this is a small-ish cost to pay, because algorithms using x.shape[1] == SPECIAL_LENGTH: ... are not common.
It's quite another thing to say x.ndim may be None, so code like x.ndim == 2: ... (quite common) is now invalid.

rgommers · 2021-11-04T16:50:00Z

IIRC we never really had any data on the "unknown rank" use cases. I thought our main remaining decision here was whether to represent the unknown dimension inside .shape by None or nan.

kgryte · 2021-11-04T19:23:37Z

Shape guidance has now been updated to remove support for unknown ranks, in alignment with @shoyer's proposal and as discussed on today's call.

…update-shape

kgryte · 2021-11-05T04:08:02Z

As this has been discussed and agreed upon during consortium meetings and both here and in #97, will merge. Any further changes can be addressed in follow-up PRs.

kgryte added 2 commits October 21, 2021 01:00

Specify shape behavior

0637d84

Update copy

1284dfd

kgryte added the API change Changes to existing functions or objects in the API. label Oct 21, 2021

kgryte added this to the v2021 milestone Oct 21, 2021

kgryte mentioned this pull request Nov 4, 2021

Some issues with type annotations #143

Closed

13 tasks

kgryte requested a review from rgommers November 4, 2021 08:35

Update guidance to remove support for arrays of unknown rank

ee41a10

kgryte added 2 commits November 4, 2021 21:03

Update copy

1ab47b2

Merge branch 'main' of https://github.com/pydata-apis/array-api into …

a4b0542

…update-shape

kgryte removed the request for review from rgommers November 5, 2021 04:06

kgryte merged commit 654d58d into main Nov 5, 2021

kgryte deleted the update-shape branch November 5, 2021 04:08

jakirkham mentioned this pull request Sep 23, 2022

Adding support for len/__len__ #481

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Specify shape behavior #289

Specify shape behavior #289

Uh oh!

kgryte commented Oct 21, 2021 •

edited

Loading

Uh oh!

asmeurer commented Oct 22, 2021

Uh oh!

kgryte commented Nov 4, 2021

Uh oh!

alextp commented Nov 4, 2021

Uh oh!

shoyer commented Nov 4, 2021

Uh oh!

kgryte commented Nov 4, 2021

Uh oh!

rgommers commented Nov 4, 2021

Uh oh!

rgommers commented Nov 4, 2021

Uh oh!

kgryte commented Nov 4, 2021

Uh oh!

kgryte commented Nov 5, 2021

Uh oh!

Uh oh!

Specify shape behavior #289

Specify shape behavior #289

Uh oh!

Conversation

kgryte commented Oct 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asmeurer commented Oct 22, 2021

Uh oh!

kgryte commented Nov 4, 2021

Uh oh!

alextp commented Nov 4, 2021

Uh oh!

shoyer commented Nov 4, 2021

Uh oh!

kgryte commented Nov 4, 2021

Uh oh!

rgommers commented Nov 4, 2021

Uh oh!

rgommers commented Nov 4, 2021

Uh oh!

kgryte commented Nov 4, 2021

Uh oh!

kgryte commented Nov 5, 2021

Uh oh!

Uh oh!

kgryte commented Oct 21, 2021 •

edited

Loading