-
Notifications
You must be signed in to change notification settings - Fork 52
Add clip to the specification
#715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I've updated this PR based on feedback from the 30 November 2023 workgroup meeting. Namely,
|
rgommers
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes
minandmaxkeyword-only arguments.
The majority of usage today is like this, with scalar values but more importantly positional use of min/max (and not even named that in numpy today): clip(x, -1, 4). This doesn't seem unreasonable, so I'd prefer positional usage I think. Makes introduction a lot less disruptive.
The discussion in gh-482 leans towards this I'd say:
def clip(x, /, min=None, max=None):Introducing that in NumPy would be fine in a minor release.
The rest of the semantics for min/max all LGTM.
I updated the guidance regarding the output data type to be the same as the input array x and not the result of type promotion. An argument could be made either way; however, as discussed during the workgroup meeting, user expectation is most likely to be that the output data type matches x, and the specification does not have precedent (TMK) for array kwargs influencing the output data type. As such, specification was guidance was updated accordingly. Note, however, that this differs from current behavior in, e.g., NumPy.
This seems reasonable. NumPy is already a little inconsistent with itself, sometimes scalars cause type promotion, sometimes not. E.g.:
>>> x = np.arange(6).astype(np.int8)
>>> np.clip(x, -1, 4.5) # yields float64
array([0. , 1. , 2. , 3. , 4. , 4.5])
>>> np.clip(x, -1, 4) # no promotion, all fits in int8
>>> xu = x.astype(np.uint8)
>>> np.clip(xu, -1, 4) # now min value doesn't cause upcasting to int16
...
OverflowError: Python integer -1 out of bounds for uint8Mostly the result is upcast though:
>>> x_f32 = np.linspace(0, 1, num=5, dtype=np.float32)
>>> np.clip(x_f32, 0.2, 0.8)
array([0.2 , 0.25, 0.5 , 0.75, 0.8 ], dtype=float32)
>>> np.clip(x_f32, np.array([0.2], dtype=np.float64), 0.8)
array([0.2 , 0.25, 0.5 , 0.75, 0.8 ])The cross-kind promotion or comparison is problematic anyway. It's not entirely clear from the language here what the expected result is for clip(x_int32, 0.6, 2.3) - the minimum value in the output could be 1 or 0 depending how you'd implement the function, because some internal casting will have to happen.
Probably there should be an additional note that cross-kind dtypes are unspecified behavior?
|
@rgommers I've updated |
rgommers
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM now.
I'll leave it open for a bit in case more people have comments.
|
As this PR has received the OK and has not received any further comments, will go ahead and merge. Thanks all! |
This PR
clip#482 by addingclipto the Array API specification for clamping each element of an input array to a specified range.x,min,max). This would be in the spirit of earlier efforts to ensure that type promotion rules are applied consistently throughout the specification. However, in contrast to Update output dtypes for bitwise shift operations #201,minandmaxare kwargs, and we do not have, TMK, any precedent for array kwargs influencing the data type of the output array. Furthermore, when clamping, users are more likely to want an output array of the same data type as the input array (this was also raised on the NumPy issue tracker: ENH: Ensure that output of np.clip has the same dtype as the main array numpy/numpy#24976).xto be broadcast, thus allowing the output array to have a rank greater than the input array. This differs from TensorFlow, which requires that the output array shape be the same as the input array shape. NumPy, however, supports such broadcasting behavior. Note thatxcan be broadcast is somewhat at odds with not allowing type promotion. For the output data type, I argued thatminandmaxshould not affect the output data type, but, in allowingxto be broadcast, this would mean thatminandmaxshould affect the output array shape. This is likely fine and consistent with the rest of the specification, where we have plenty of kwargs which affect the output array shape, although this would be the first, TMK, involving broadcasting.min > max, behavior is unspecified. NumPy et al set output values tomax; however, other implementations should be free to raise an exception or support alternative behavior.minandmaxto be optional. When bothminandmaxareNone, the function is essentially a no-op. This follows PyTorch, but differs from NumPy which allowsminandmaxbeNone, but not at the same time.xis an integer data type andminormaxis a floating-point data type), which is consistent elsewhere in the specification. TensorFlow raises an exception in such a scenario.minandmaxpositional and keyword arguments.clip. TensorFlow uses the nameclip_by_value. PyTorch also includesclip, but this aliases toclamp.Note that this PR would introduce changes to existing
clipfunctionality in NumPy et al. Namely,minandmaxare positional and keyword arguments; whereas, in NumPy,a_minanda_maxare positional.a_minanda_max.minormaxbe allowed to beNoneat the same time.