Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add array indexing specification #46

Merged
merged 27 commits into from
Nov 8, 2020
Merged

Add array indexing specification #46

merged 27 commits into from
Nov 8, 2020

Conversation

kgryte
Copy link
Contributor

@kgryte kgryte commented Sep 17, 2020

This PR

  • specifies array indexing semantics.

Notes

@kgryte kgryte changed the title WIP: add array indexing specification Add array indexing specification Sep 21, 2020

- Providing a slice must retain array dimensions (i.e., the array rank must remain the same; `rank(A) == rank(A[:])`).

- For each slice which attempts to select elements along a particular axis, but whose starting index is out-of-bounds, the axis (dimension) size of the result array must be `0`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For devices like GPUs where you might want to allocate the memory for the result of slicing before you know the indices (as the allocation happens on the host and the indices reside on the device) this type of shape dynamism can be problematic. We should discuss what to do here in the same way as we discussed mutability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alextp for reviewing and raising this concern! Would definitely be good to discuss. cc @rgommers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this also be an issue for slices bounds that are beyond the array shape (e.g., a[0:1000] where a has shape (100,))?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think slice clipping could be omitted from the spec. In other words, the spec only specifies what happens for slices where the start or stop are in [-size, size] (where the ends of that interval may or may not be included depending on the different cases of start/stop and negative/positive step).

Slices on Python lists implement "clipping" behavior to the size, which NumPy matches, but this could be something that isn't specified in the spec, and libraries could do something else. It is also possible to "manually" implement clipping in user code, in much a similar way that you can "manually" implement bounds checking (clipping happens first when a slice is computed, see the a[-100::2] example here).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both slices and array shapes are usually kept on the host, so I think you should be able to figure out the necessary buffer sizes without any synchronization. Or am I missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apaszke reading the first comment from @alextp, I think he meant something like

ix_start = x.sum()  # computed index depends on values in GPU array `x`
z = y[ix_start : ix_start + 2]  # z will have size 2, unless ix_start is >= y.shape[0] - 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed any bounds checking requirements from the proposal, under the rationale that how an array implementation handles out-of-bounds indices (including within slices) is best left to the implementation.

Based on your example @rgommers and @alextp's comment, the gist of the problem is that I may want to allocate memory for an array having axis size 2, regardless of whether indices are valid (in-bounds) or not. And if we required that, in this case, out-of-bounds indices unconditionally must result in an array whose axis size is 0, then I won't be able to allocate memory without knowing the actual index values, thus triggering device syncs and triggering a perf cliff.

@kgryte
Copy link
Contributor Author

kgryte commented Nov 8, 2020

@rgommers This should be ready for review given removal of bounds checking requirements.

Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let's get this in. The rationale for lack of specifying clipping looks fine, there may be follow-up discussion - we can expand that paragraph if needed.

@rgommers rgommers merged commit b222f96 into master Nov 8, 2020
@rgommers rgommers deleted the indexing branch November 8, 2020 22:15
@rgommers
Copy link
Member

rgommers commented Nov 8, 2020

Thanks for the input everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants