-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add array indexing specification #46
Conversation
spec/API_specification/indexing.md
Outdated
|
||
- Providing a slice must retain array dimensions (i.e., the array rank must remain the same; `rank(A) == rank(A[:])`). | ||
|
||
- For each slice which attempts to select elements along a particular axis, but whose starting index is out-of-bounds, the axis (dimension) size of the result array must be `0`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For devices like GPUs where you might want to allocate the memory for the result of slicing before you know the indices (as the allocation happens on the host and the indices reside on the device) this type of shape dynamism can be problematic. We should discuss what to do here in the same way as we discussed mutability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this also be an issue for slices bounds that are beyond the array shape (e.g., a[0:1000]
where a
has shape (100,)
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think slice clipping could be omitted from the spec. In other words, the spec only specifies what happens for slices where the start or stop are in [-size, size] (where the ends of that interval may or may not be included depending on the different cases of start/stop and negative/positive step).
Slices on Python lists implement "clipping" behavior to the size, which NumPy matches, but this could be something that isn't specified in the spec, and libraries could do something else. It is also possible to "manually" implement clipping in user code, in much a similar way that you can "manually" implement bounds checking (clipping happens first when a slice is computed, see the a[-100::2]
example here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both slices and array shapes are usually kept on the host, so I think you should be able to figure out the necessary buffer sizes without any synchronization. Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed any bounds checking requirements from the proposal, under the rationale that how an array implementation handles out-of-bounds indices (including within slices) is best left to the implementation.
Based on your example @rgommers and @alextp's comment, the gist of the problem is that I may want to allocate memory for an array having axis size 2
, regardless of whether indices are valid (in-bounds) or not. And if we required that, in this case, out-of-bounds indices unconditionally must result in an array whose axis size is 0
, then I won't be able to allocate memory without knowing the actual index values, thus triggering device syncs and triggering a perf cliff.
@rgommers This should be ready for review given removal of bounds checking requirements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, let's get this in. The rationale for lack of specifying clipping looks fine, there may be follow-up discussion - we can expand that paragraph if needed.
Thanks for the input everyone! |
This PR
Notes
bounds checking: JAX/Numba) doesn't seem to perform bounds checks for perf reasons (see jax.numpy array indexing has different out-of-bounds behavior to numpy jax-ml/jax#278). Out-of-bounds index behavior left unspecified.
Left out
np.newaxis
in favor of addingexpand_dims
to specification.Apart from boolean array indexing, NumPy's "advanced indexing" is not included and is not universally among array libraries (e.g., dask and TensorFlow do not support all of NumPy's indexing semantics).