-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add clip
to the specification
#715
Conversation
I've updated this PR based on feedback from the 30 November 2023 workgroup meeting. Namely,
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes
min
andmax
keyword-only arguments.
The majority of usage today is like this, with scalar values but more importantly positional use of min/max (and not even named that in numpy today): clip(x, -1, 4)
. This doesn't seem unreasonable, so I'd prefer positional usage I think. Makes introduction a lot less disruptive.
The discussion in gh-482 leans towards this I'd say:
def clip(x, /, min=None, max=None):
Introducing that in NumPy would be fine in a minor release.
The rest of the semantics for min/max all LGTM.
I updated the guidance regarding the output data type to be the same as the input array x and not the result of type promotion. An argument could be made either way; however, as discussed during the workgroup meeting, user expectation is most likely to be that the output data type matches x, and the specification does not have precedent (TMK) for array kwargs influencing the output data type. As such, specification was guidance was updated accordingly. Note, however, that this differs from current behavior in, e.g., NumPy.
This seems reasonable. NumPy is already a little inconsistent with itself, sometimes scalars cause type promotion, sometimes not. E.g.:
>>> x = np.arange(6).astype(np.int8)
>>> np.clip(x, -1, 4.5) # yields float64
array([0. , 1. , 2. , 3. , 4. , 4.5])
>>> np.clip(x, -1, 4) # no promotion, all fits in int8
>>> xu = x.astype(np.uint8)
>>> np.clip(xu, -1, 4) # now min value doesn't cause upcasting to int16
...
OverflowError: Python integer -1 out of bounds for uint8
Mostly the result is upcast though:
>>> x_f32 = np.linspace(0, 1, num=5, dtype=np.float32)
>>> np.clip(x_f32, 0.2, 0.8)
array([0.2 , 0.25, 0.5 , 0.75, 0.8 ], dtype=float32)
>>> np.clip(x_f32, np.array([0.2], dtype=np.float64), 0.8)
array([0.2 , 0.25, 0.5 , 0.75, 0.8 ])
The cross-kind promotion or comparison is problematic anyway. It's not entirely clear from the language here what the expected result is for clip(x_int32, 0.6, 2.3)
- the minimum value in the output could be 1
or 0
depending how you'd implement the function, because some internal casting will have to happen.
Probably there should be an additional note that cross-kind dtypes are unspecified behavior?
@rgommers I've updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM now.
I'll leave it open for a bit in case more people have comments.
As this PR has received the OK and has not received any further comments, will go ahead and merge. Thanks all! |
This PR
clip
#482 by addingclip
to the Array API specification for clamping each element of an input array to a specified range.x
,min
,max
). This would be in the spirit of earlier efforts to ensure that type promotion rules are applied consistently throughout the specification. However, in contrast to Update output dtypes for bitwise shift operations #201,min
andmax
are kwargs, and we do not have, TMK, any precedent for array kwargs influencing the data type of the output array. Furthermore, when clamping, users are more likely to want an output array of the same data type as the input array (this was also raised on the NumPy issue tracker: ENH: Ensure that output of np.clip has the same dtype as the main array numpy/numpy#24976).x
to be broadcast, thus allowing the output array to have a rank greater than the input array. This differs from TensorFlow, which requires that the output array shape be the same as the input array shape. NumPy, however, supports such broadcasting behavior. Note thatx
can be broadcast is somewhat at odds with not allowing type promotion. For the output data type, I argued thatmin
andmax
should not affect the output data type, but, in allowingx
to be broadcast, this would mean thatmin
andmax
should affect the output array shape. This is likely fine and consistent with the rest of the specification, where we have plenty of kwargs which affect the output array shape, although this would be the first, TMK, involving broadcasting.min > max
, behavior is unspecified. NumPy et al set output values tomax
; however, other implementations should be free to raise an exception or support alternative behavior.min
andmax
to be optional. When bothmin
andmax
areNone
, the function is essentially a no-op. This follows PyTorch, but differs from NumPy which allowsmin
andmax
beNone
, but not at the same time.x
is an integer data type andmin
ormax
is a floating-point data type), which is consistent elsewhere in the specification. TensorFlow raises an exception in such a scenario.min
andmax
positional and keyword arguments.clip
. TensorFlow uses the nameclip_by_value
. PyTorch also includesclip
, but this aliases toclamp
.Note that this PR would introduce changes to existing
clip
functionality in NumPy et al. Namely,min
andmax
are positional and keyword arguments; whereas, in NumPy,a_min
anda_max
are positional.a_min
anda_max
.min
ormax
be allowed to beNone
at the same time.