ENH: a simple `cov` #10

lucascolley · 2024-09-25T20:51:25Z

This would replace a private utility in SciPy.

src/array_api_extra/_funcs.py

kgryte · 2024-09-25T21:39:14Z

src/array_api_extra/_funcs.py

@@ -46,3 +47,48 @@ def atleast_nd(x: Array, *, ndim: int, xp: ModuleType) -> Array:
        x = xp.expand_dims(x, axis=0)
        x = atleast_nd(x, ndim=ndim, xp=xp)
    return x
+
+
+def cov(x: Array, *, xp: ModuleType) -> Array:


While this implementation and API is constrained enough, I would exercise caution when adding such APIs to ensure that the design of whatever is included in this library is consistent with Array API design principles and does not impose limitations on future standardization.

While not applicable in this specific instance, I would like to avoid a scenario where array_api_extra provides a backdoor to de facto standardized APIs which have design choices that we wouldn't want to endorse were we standardizing directly in the standard. Meaning, I'd like to avoid getting backed into a corner.

In this case, the particular design aspect which comes to mind is the behavior of a correction argument, similar to what we have in std and var.

In general, my current thinking is that, so long as this is under the Consortium umbrella, potential APIs need to go through a somewhat similar "standardization" proposal and inclusion process. That means comparative library analysis, discussion of design choices, and whether inclusion presents an opportunity to address limitations in prior art.

This is a very good point, and we need to think more about API stability here. At the same time, this could to some degree defeat the purpose of the library if we have to do a full standardization process for every function. I wonder if there's a way we could organize things here so that API stability isn't as big of a concern. For instance, we could make it so that array API extra is only supported as a vendored library (or requiring people to just copy-paste the implementations). Or we could organize the code in a way so that different versions of APIs are possible.

Thanks for the feedback @kgryte ! Agreed that we want to avoid creating de facto standardized APIs here with less care than would be put into the standard. I can see how it would be bad to guide consumer libraries towards whatever APIs are here, just to tell them that they need to change again when a more considered API is standardized.

I do think that this library should still be able to move more quickly than the standard though - the primary motivation being providing functionality to consumer libraries via some reasonable API, rather than convincing array libraries that this is the "correct" API (there is no need since everything is implemented in terms of the standard). Having a non-ideal API here is (arguably) still better than making consumer library maintainers search through private modules of other libraries like SciPy or implement the function themselves, provided that we can sufficiently communicate that an API being a part of this library does not mean that it is standardized.

If we were to go through a very similar process to the standard, and the result was consensus on an API, then it should probably just be added to the standard itself!

I think one way we could try to avoid creating de facto standardized APIs is being quite liberal in adding kwargs to vary behaviour when there are competing ideas (i.e. competing use-cases in consumer libraries). Then, while functions live in this library, consumer libraries can pick and choose the functionality they need without any particular mode being "standard". If at some point some "extra" functionality is considered for standardization, then the normal process would apply, but with the additional information of how the API from this library has been used.

That means comparative library analysis, discussion of design choices, and whether inclusion presents an opportunity to address limitations in prior art.

This would be fantastic to see, but given that I'm heading back to university soon, I am unlikely to be able to contribute the necessary time to carry out that process rigorously for the foreseeable future. So development here may be quite slow. Hopefully you can get others to fill that role while I'm too busy! I do think that this library will be an important part of making adoption of the standard more easy for consumer libraries.

EDIT: The same point about time applies to other potential contributors. I think we may be able to create some nice collaboration where consumer libraries take from and give back to xpx by writing a function or two, which in turn will help other consumer libraries. I think we are less likely to convince consumer library maintainers to enter discussions about an ideal API (they may well not really care about the API at all, they just need something that works).

I think one way we could try to avoid creating de facto standardized APIs is being quite liberal in adding kwargs to vary behaviour when there are competing ideas (i.e. competing use-cases in consumer libraries).

I'd go the opposite way: namely, being very conservative in terms of what behavior is supported.

For some background on Array API "extras", this idea has been discussed during workgroup meetings at various points for different reasons. One of those reasons is a potential place for APIs which have broad, but not ubiquitous, consensus among ecosystem libraries. E.g., a number of ecosystem libraries have other special functions, such as gamma, erf, beta, etc. Some of these are unlikely to be standardized directly in the standard due to the NumPy/SciPy split. Other libraries have APIs associated with deep learning applications (e.g., various loss functions), but these are unlikely to ever be directly added to more general purpose libraries such as NumPy. "Extras" was batted around as a way to provide consensus APIs outside of the standard, provided such APIs could either be implemented in terms of the standard or delegated to libraries exposing a compliant API.

With this in mind, similar to the standard, we'd be best served to be biased toward parsimony, not abundance.

This stated, this isn't to say that "extras" cannot have non-consensus APIs (i.e., APIs which have no precedent in array libraries). Those can be in scope. However, for APIs which do have precedent in array libraries, IMO, we should tread a bit more carefully to ensure what's implemented is aligned behavior.

lucascolley · 2024-09-28T15:55:56Z

`cov`

This PR implements a small subset of numpy.cov, namely cov(m), to replace a private helper in SciPy.

NumPy, JAX and PyTorch all agree on the same default behaviour for single-argument cov (unbiased estimate)
No particular design choices for this API.
As mentioned above by @kgryte, if this function is extended to support more arguments of the existing implementations, there is potential to address some prior limitations of the NumPy API by using a correction argument like PyTorch and other statistical functions in the standard. Need for that hasn't arisen yet though, so that isn't attempted here.

Good to proceed? EDIT: edited to make mean private.

rgommers · 2024-09-28T18:57:49Z

Why add mean? It seems from data-apis/array-api#846 that it's only an oversight that complex support wasn't added, and that can be done. I would just finish that discussion, and then likely nothing else has to change (except perhaps deleting a check in array-api-compat). Moving usages of mean over to this package and then back a little later sound like quite a bit of churn for no real reason.

lucascolley · 2024-09-28T19:05:54Z

We could keep complex mean private here until there is a reported use case for it, and remove it once it is supported in array-api-compat - I think that sounds like a better idea than exposing it just because we can.

rgommers · 2024-09-28T19:08:48Z

That sounds good to me.

asmeurer · 2024-09-30T19:49:29Z

and then likely nothing else has to change (except perhaps deleting a check in array-api-compat)

I guess you mean array-api-strict. Complex mean already works with the upstream libraries (at least NumPy and PyTorch, I didn't check the others), and will work with array-api-compat.

Also, since array-api-strict now supports multiple versions, we can easily add a preliminary 2024.12 version and disable the complex check in mean for that version, even before that version of the standard is released.

TST: cov: add another test DOC: cov, mean: WIP docstrings API: make `mean` private DOC: cov: finish docstring

lucascolley

this is ready to go in, I think

kgryte · 2024-10-26T16:53:51Z

NumPy, JAX and PyTorch all agree on the same default behaviour for single-argument cov (unbiased estimate)

FWIW, TensorFlow disagrees here.

This aside, it is somewhat unfortunate that the default behavior in NumPy for cov (bias = False) diverges from the default behavior of var and std. The use of bias appears to be legacy, being subsequently supplanted by ddof. If we were designing from first principles and considering adding to the standard, I'd argue for being consistent with existing precedent in the standard, rather than having diverging conventions, especially as, while N - 1 is common, it is by no means the universally appropriate normalization factor, and my inclination would be to force users to be explicit about the normalization factor they believe suitable.

rgommers · 2024-10-27T06:31:07Z

If we were designing from first principles and considering adding to the standard, I'd argue for being consistent with existing precedent in the standard

But we aren't here, so I think matching what NumPy, JAX and PyTorch was fine for this PR. Making changes just makes it harder to use, so let's not cross that bridge until we have to for standardization (which may well be never).

kgryte · 2024-10-27T07:48:59Z

But we aren't here... Making changes just makes it harder to use

I'm not sure that is the case. TMK, Array API extras is not restricted to being a compat layer, and as an independent library, could be free to make design choices which (mildly in this case) diverge from existing libraries such as NumPy which carry historical context.

The notion of "harder to use" is somewhat in the eye of the beholder here. Yes, following existing NumPy practice (which is based on a bias boolean kwarg which would be highly unlikely to be standardized given standardized var and std convention) would mean that downstream libraries such as SciPy could simply swap np.cov for xpx.cov, but this also means persisting the additional cognitive burden among future code readers in recalling that default conventions differ between the univariate and multivariate cases. Personally, I place greater value on the latter than the former, and more so in this case as Array API extras has some freedom to evolve the story a bit.

rgommers · 2024-11-03T16:02:13Z

but this also means persisting the additional cognitive burden among future code readers in recalling that default conventions differ between the univariate and multivariate cases

I don't think this is an extra burden, I'm reasonably confident that matching cov in other libraries is more important and less confusing on balance than matching std/var would be. You'd be right for a reader that knows the math, doesn't know or use any other array library, and comes at this fresh using only the standard and array-api-extra - but that will be rare.

lucascolley changed the title ~~ENH: a simple cov~~ ENH: a simple cov and complex mean Sep 25, 2024

lucascolley added enhancement New feature or request new function labels Sep 25, 2024

kgryte reviewed Sep 25, 2024

View reviewed changes

src/array_api_extra/_funcs.py Show resolved Hide resolved

kgryte reviewed Sep 25, 2024

View reviewed changes

lucascolley mentioned this pull request Sep 25, 2024

RFC: add support for complex input to mean data-apis/array-api#846

Closed

lucascolley changed the title ~~ENH: a simple cov and complex mean~~ ENH: a simple cov Sep 29, 2024

lucascolley marked this pull request as ready for review September 29, 2024 14:48

ENH: a simple cov

2edc2ce

TST: cov: add another test DOC: cov, mean: WIP docstrings API: make `mean` private DOC: cov: finish docstring

lucascolley force-pushed the cov branch from 5b3b8d5 to 2edc2ce Compare October 24, 2024 11:03

lucascolley commented Oct 24, 2024

View reviewed changes

lucascolley merged commit 548103c into data-apis:main Oct 26, 2024
7 checks passed

lucascolley deleted the cov branch October 27, 2024 07:57

lucascolley mentioned this pull request Nov 5, 2024

ENH: add sinc #16

Closed

ENH: a simple cov #10

ENH: a simple cov #10

Uh oh!

Conversation

lucascolley commented Sep 25, 2024

Uh oh!

Uh oh!

kgryte Sep 25, 2024

Choose a reason for hiding this comment

Uh oh!

asmeurer Sep 25, 2024

Choose a reason for hiding this comment

Uh oh!

lucascolley Sep 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kgryte Sep 25, 2024

Choose a reason for hiding this comment

Uh oh!

kgryte Sep 25, 2024

Choose a reason for hiding this comment

Uh oh!

lucascolley commented Sep 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

cov

Uh oh!

rgommers commented Sep 28, 2024

Uh oh!

lucascolley commented Sep 28, 2024

Uh oh!

rgommers commented Sep 28, 2024

Uh oh!

asmeurer commented Sep 30, 2024

Uh oh!

lucascolley left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kgryte commented Oct 26, 2024

Uh oh!

rgommers commented Oct 27, 2024

Uh oh!

kgryte commented Oct 27, 2024

Uh oh!

rgommers commented Nov 3, 2024

Uh oh!

Uh oh!

ENH: a simple `cov` #10

ENH: a simple `cov` #10

lucascolley Sep 25, 2024 •

edited

Loading

lucascolley commented Sep 28, 2024 •

edited

Loading

`cov`