Add `clip` to the specification #715

kgryte · 2023-11-30T11:50:59Z

This PR

resolves Add clip #482 by adding clip to the Array API specification for clamping each element of an input array to a specified range.
requires that the output data type be the same as the input array. One could argue that the output data type follow type promotion rules (between x, min, max). This would be in the spirit of earlier efforts to ensure that type promotion rules are applied consistently throughout the specification. However, in contrast to Update output dtypes for bitwise shift operations #201, min and max are kwargs, and we do not have, TMK, any precedent for array kwargs influencing the data type of the output array. Furthermore, when clamping, users are more likely to want an output array of the same data type as the input array (this was also raised on the NumPy issue tracker: ENH: Ensure that output of np.clip has the same dtype as the main array numpy/numpy#24976).
allows the input array x to be broadcast, thus allowing the output array to have a rank greater than the input array. This differs from TensorFlow, which requires that the output array shape be the same as the input array shape. NumPy, however, supports such broadcasting behavior. Note that x can be broadcast is somewhat at odds with not allowing type promotion. For the output data type, I argued that min and max should not affect the output data type, but, in allowing x to be broadcast, this would mean that min and max should affect the output array shape. This is likely fine and consistent with the rest of the specification, where we have plenty of kwargs which affect the output array shape, although this would be the first, TMK, involving broadcasting.
specifies that if min > max, behavior is unspecified. NumPy et al set output values to max; however, other implementations should be free to raise an exception or support alternative behavior.
allows both min and max to be optional. When both min and max are None, the function is essentially a no-op. This follows PyTorch, but differs from NumPy which allows min and max be None, but not at the same time.
does not prohibit mixed data types, but leaves behavior unspecified (e.g., when x is an integer data type and min or max is a floating-point data type), which is consistent elsewhere in the specification. TensorFlow raises an exception in such a scenario.
makes min and max positional and keyword arguments.
Follows NumPy, JAX, CuPy, and Dask in naming the API clip. TensorFlow uses the name clip_by_value. PyTorch also includes clip, but this aliases to clamp.

Note that this PR would introduce changes to existing clip functionality in NumPy et al. Namely,

min and max are positional and keyword arguments; whereas, in NumPy, a_min and a_max are positional.
deviates from NumPy's naming convention of a_min and a_max.
NumPy requires that only one of min or max be allowed to be None at the same time.

… type

kgryte · 2024-01-11T06:52:05Z

I've updated this PR based on feedback from the 30 November 2023 workgroup meeting. Namely,

I renamed the API from clamp to clip. Given widespread usage of this API, it was considered undesirable to rename and deprecate clip in NumPy et al. Even though this PR introduces behavior which differs from NumPy (e.g., kwargs, output data type, etc), it was considered better if NumPy simply moved to adopt specified behavior.
I updated the guidance regarding the output data type to be the same as the input array x and not the result of type promotion. An argument could be made either way; however, as discussed during the workgroup meeting, user expectation is most likely to be that the output data type matches x, and the specification does not have precedent (TMK) for array kwargs influencing the output data type. As such, specification was guidance was updated accordingly. Note, however, that this differs from current behavior in, e.g., NumPy.

rgommers

makes min and max keyword-only arguments.

The majority of usage today is like this, with scalar values but more importantly positional use of min/max (and not even named that in numpy today): clip(x, -1, 4). This doesn't seem unreasonable, so I'd prefer positional usage I think. Makes introduction a lot less disruptive.

The discussion in gh-482 leans towards this I'd say:

def clip(x, /, min=None, max=None):

Introducing that in NumPy would be fine in a minor release.

The rest of the semantics for min/max all LGTM.

I updated the guidance regarding the output data type to be the same as the input array x and not the result of type promotion. An argument could be made either way; however, as discussed during the workgroup meeting, user expectation is most likely to be that the output data type matches x, and the specification does not have precedent (TMK) for array kwargs influencing the output data type. As such, specification was guidance was updated accordingly. Note, however, that this differs from current behavior in, e.g., NumPy.

This seems reasonable. NumPy is already a little inconsistent with itself, sometimes scalars cause type promotion, sometimes not. E.g.:

>>> x = np.arange(6).astype(np.int8)
>>> np.clip(x, -1, 4.5)  # yields float64
array([0. , 1. , 2. , 3. , 4. , 4.5])
>>> np.clip(x, -1, 4)  # no promotion, all fits in int8
>>> xu = x.astype(np.uint8)
>>> np.clip(xu, -1, 4)  # now min value doesn't cause upcasting to int16
...
OverflowError: Python integer -1 out of bounds for uint8

Mostly the result is upcast though:

>>> x_f32 = np.linspace(0, 1, num=5, dtype=np.float32)
>>> np.clip(x_f32, 0.2, 0.8)
array([0.2 , 0.25, 0.5 , 0.75, 0.8 ], dtype=float32)
>>> np.clip(x_f32, np.array([0.2], dtype=np.float64), 0.8)
array([0.2 , 0.25, 0.5 , 0.75, 0.8 ])

The cross-kind promotion or comparison is problematic anyway. It's not entirely clear from the language here what the expected result is for clip(x_int32, 0.6, 2.3) - the minimum value in the output could be 1 or 0 depending how you'd implement the function, because some internal casting will have to happen.

Probably there should be an additional note that cross-kind dtypes are unspecified behavior?

…ta type kinds

kgryte · 2024-01-11T17:56:20Z

@rgommers I've updated min and max to be both positional and kwarg. And I added a note stating that arguments having different data type kinds will result in implementation-dependent behavior.

rgommers

Thanks, LGTM now.

I'll leave it open for a bit in case more people have comments.

kgryte · 2024-01-22T20:07:06Z

As this PR has received the OK and has not received any further comments, will go ahead and merge. Thanks all!

kgryte added 2 commits November 30, 2023 03:27

Add clamp to the specification

17fcb9a

Allow broadcasting of the input array

14b297f

kgryte added the API extension Adds new functions or objects to the API. label Nov 30, 2023

kgryte added this to the v2023 milestone Nov 30, 2023

kgryte mentioned this pull request Nov 30, 2023

Tracking issue for the 2023 revision of the array API specification #643

Closed

17 tasks

Rename function and change to return an output array of the same data…

f665701

… type

kgryte changed the title ~~Add clamp to the specification~~ Add clip to the specification Jan 11, 2024

Fix export name

51b25e2

rgommers reviewed Jan 11, 2024

View reviewed changes

kgryte added 2 commits January 11, 2024 09:52

Allow min and max to be positional

68239b0

Add note regarding behavior when x and min/max are different da…

ec9fd82

…ta type kinds

rgommers approved these changes Jan 11, 2024

View reviewed changes

kgryte merged commit 6e320d0 into data-apis:main Jan 22, 2024

kgryte deleted the feat/clip branch January 22, 2024 20:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `clip` to the specification #715

Add `clip` to the specification #715

kgryte commented Nov 30, 2023 •

edited

Loading

kgryte commented Jan 11, 2024

rgommers left a comment

kgryte commented Jan 11, 2024

rgommers left a comment

kgryte commented Jan 22, 2024

Add clip to the specification #715

Add clip to the specification #715

Conversation

kgryte commented Nov 30, 2023 • edited Loading

kgryte commented Jan 11, 2024

rgommers left a comment

Choose a reason for hiding this comment

kgryte commented Jan 11, 2024

rgommers left a comment

Choose a reason for hiding this comment

kgryte commented Jan 22, 2024

Add `clip` to the specification #715

Add `clip` to the specification #715

kgryte commented Nov 30, 2023 •

edited

Loading