Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove "use axis after broadcasting" and change it or unspecify it #724

Closed
seberg opened this issue Dec 18, 2023 · 11 comments · Fixed by #740
Closed

Remove "use axis after broadcasting" and change it or unspecify it #724

seberg opened this issue Dec 18, 2023 · 11 comments · Fixed by #740

Comments

@seberg
Copy link
Contributor

seberg commented Dec 18, 2023

@mhvk noticed here that the definitions here are incompatible with NumPy gufuncs which I think is an oversight that was never noted/discussed. The "after broadcasting" looks reasonable on first sight, but TBH, doesn't really seem reasonable to me:

Assume broadcasting does something, you are specifying an axes that by definition has length 1 along the array, and also by definition has little meaning for the array which gets broadcast.

Now, is the alternative of using the non-broadcast axes better? Maybe not, maybe it is also very unclear and any positive axes (i.e. where broadcasting matters) is just too strange.

But, whether you like what NumPy does or not after thinking about it, it does seem if anything at least (maybe moreso) logical than the alternative.

So, I propose to change any place where broadcasting is mentioned w.r.t. to axes handling to define it as unspecified. Negative axis remain allowed, as it doesn't interfere with broadcasting. Negative axis beyond the number of existing dimensions prior to broadcasting will also be unspecified.
The last point looks necessary to me and strengthens the argument for NumPy's (or unspecified/error) logic: matmul([1], [[2]], axis=(-2, -1)) should not be valid, since it might as well be a bug that the first array is not 2-D as it should be).

EDIT: The above example should use vecdot, matmul doesn't work: vecdot(1, [2], axis=-1)

@seberg
Copy link
Contributor Author

seberg commented Dec 18, 2023

I will note that this applies to vecdot. I have to see what other functions we have to be sure if it applies to other functions. In some cases, the axis might be more like an operation followed by a reduction and not like a core dimensions. For such cases the broadcast specification is fine.

@mhvk
Copy link

mhvk commented Dec 18, 2023

I agree with your general point about not interpreting axis as after broadcasting. For gufunc at least, I think we do not have to change anything, as axis is very much just a shortcut for axes=[axis]*npp and for axes it is for each operand separately, so there is no ambiguity.

@seberg
Copy link
Contributor Author

seberg commented Dec 18, 2023

I think we do not have to change anything

The point is changing the text here. It is a bit unclear, but does seem to suggest to first broadcast. And I suspect that was the intention when it was written.

EDIT: To be a bit clearer:

axis over which to compute the dot product. Must be an integer on the interval [-N, N), where N is the rank (number of dimensions) of the shape determined according to Broadcasting.

Notes "broadcasting", which to me suggests that the axis of the broadcasted arrays is taken much as if we were doing (x * y).sum(axis), which is not compatible with NumPy gufuncs.

@mhvk
Copy link

mhvk commented Dec 18, 2023

Ah, I see, yes, that text is inconsistent with the gufunc implementation. For that text, I agree that axis may be best defined as always negative - it really does not make any sense to even talk about dot if the axis dimension is broadcast (EDIT: which the gufunc definition automatically prevents)

@kgryte
Copy link
Contributor

kgryte commented Dec 18, 2023

@seberg I believe this issue is related to #617, correct?

@seberg
Copy link
Contributor Author

seberg commented Dec 18, 2023

It is the same issue yes. But to note that this needs resolution (or rather for the sake of NUmPy, I will assume it will be resolved by either aligning with NumPy or unspecifying it).

@asmeurer
Copy link
Member

asmeurer commented Feb 2, 2024

Looking at this and #617 again. I agree with @seberg here. I didn't realize that gufuncs use the axis before broadcasting. If gufuncs are doing this, then that makes this case stronger since the spec is disagreeing with gufunc behavior here. So I agree completely that we should change the spec to only specify axis when it is negative. Users can work around this by manually broadcasting first, but if it really becomes an issue, I would suggest adding axes (or allowing a list of tuples for axis), so that the axes of each array can be specified independently.

In short, specifying an axis uniformly for multiple arrays (i.e., axis as a single integer rather than a tuple or list of tuples of integers), is inherently ambiguous when the axis is nonnegative and the arrays are broadcasted. The most logical behavior (which numpy gufuncs follow) is for the axis to refer to the pre-broadcasted (i.e., actual) array shapes. But the dimensions referred to by axis are different than the one referred to in the final broadcasted shape. This results in a situation where func(x1, x2) is not the same as func(*broadcast_arrays(x1, x2)).

I will also note here that my implementation of vecdot in array-api-strict (née numpy.array_api) and array-api-compat follows the spec, and therefore is different from the new np.vecdot gufunc:

>>> np.vecdot(np.empty((1, 2, 3, 4, 5)), np.empty((3, 4, 5)), axis=2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: vecdot: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n),(n)->() (size 5 is different from 3)
>>> import numpy.array_api as xp
>>> xp.vecdot(xp.empty((1, 2, 3, 4, 5)), xp.empty((3, 4, 5)), axis=2)
Array([[[[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]]]], dtype=np.array_api.float64)

(I also apparently never implemented broadcasting for cross in np.array_api. It works correctly in array-api-compat though because that just wraps np.cross)

As far as I can tell, this is only an issue for vecdot and cross. Those are the only functions in the linalg extension that have an axis argument, other than tensordot, which does not have this issue because the axes are specified for each array separately already in that function. There are no functions outside of linalg with more than one argument and an axis argument. This could in principle apply to more of the linear algebra functions in the future, however. For instance, there's no reason why matmul couldn't gain an axis/axes argument to allow contracting over different dimensions than the last two.

I think it would be good to get this fixed for 2023 if possible. I will make a pull request.

asmeurer added a commit to asmeurer/array-api that referenced this issue Feb 2, 2024
Nonnegative axes and negative axes less than the smaller of the two arrays are
unspecified.

This is because it is ambiguous in these cases whether the
dimension should refer to the axis before or after broadcasting. Preciously,
the spec stated it should refer to the dimension before broadcasting, but this
deviates from NumPy gufunc behavior, and results in ambiguous and confusing
situations, where, for instance, the result of a the function is different
when the inputs are manually broadcasted together.

Also clean up some of the cross text a little bit since the computed dimension
must be exactly size 3.

Fixes data-apis#724
Fixes data-apis#617

See the discussion in those issues for more details.
@asmeurer
Copy link
Member

asmeurer commented Feb 2, 2024

Fix at #740

@mhvk
Copy link

mhvk commented Feb 2, 2024

Thanks, that all makes sense. Just for completeness, np.matmul (like all gufuncs) does have an axes argument that allows specifying axes for each array separately (pre-broadcasting): https://numpy.org/doc/stable/reference/ufuncs.html

axes

New in version 1.15.

A list of tuples with indices of axes a generalized ufunc should operate on. For instance, for a signature of (i,j),(j,k)->(i,k) appropriate for matrix multiplication, the base elements are two-dimensional matrices and these are taken to be stored in the two last axes of each argument. The corresponding axes keyword would be [(-2, -1), (-2, -1), (-2, -1)]. For simplicity, for generalized ufuncs that operate on 1-dimensional arrays (vectors), a single integer is accepted instead of a single-element tuple, and for generalized ufuncs for which all outputs are scalars, the output tuples can be omitted.

@asmeurer
Copy link
Member

asmeurer commented Feb 2, 2024

Yes, I rather like that feature, although I'm not sure if it's appropriate for the array API. Maybe if a use-case comes up.

@seberg
Copy link
Contributor Author

seberg commented Feb 2, 2024

I think swapping axes is fine, for matmul it does allow for slightly nicer vector-matrix multiplication maybe (but not sure it is really nicer than inserting the dimension).
Overall: Not a particularly important feature, swapaxes moveaxis can get there when needed and it's probably not needed often.

kgryte pushed a commit that referenced this issue Feb 13, 2024
This commit updates specification guidance in `vecdot` and `cross` to no longer explicitly support positive `axis` kwarg values. Previous specification guidance conflicts with NumPy gufuncs and restricting to negative integers removes ambiguity in determining over which axis to perform computation. This commit uses `should`, not `must`, to allow conforming libraries to support nonnegative `axis` values for backward compatibility.

Closes: #724
Closes: #617
PR-URL: 	#740
Reviewed-by: Athan Reines <kgryte@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants