-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: finfo_object
, iinfo_object
, _array
to typing.Protocol
#857
base: main
Are you sure you want to change the base?
Conversation
@34j thanks a lot for working on improving static typing. Did you see gh-589 though? This PR seems to reinvent parts of that PR. It's nontrivial to review a large PR and get static typing right, but I prefer to push gh-589 forward if there is energy for that. Would you be interested in checking if that addresses your issue and needs? |
7175de0
to
ac20ed5
Compare
788ba04
to
f3c9eb4
Compare
@jorenham you may have opinions on the activity here |
I certainly do. But before I share them, would you mind telling me what the exact purpose is of these "stubs" (which usually refer to |
Not sure about the exact purpose, but they aren't (just) type stubs - they're primarily for housing the (sort of typed) signatures and docstrings from which the docs pages for individual functions are generated. I suppose they are "stubs" in the sense of having no function implementations. |
Then I think that the first step should be to figure that out, and document it.
|
Disclaimer: I'm sure the array API maintainers 'know what they are building' - I've contributed to the repo, but not majorly. |
Perhaps the maintainers are mostly interested in discussing the contents of array-api and this may not be so important. However, I have noticed a number of issues while working on this implementation and the #858 implementation, and would like to offer some observations.
Thanks for taking a look anyway |
Big 👍 to @jorenham suggestion of answering the question: "What problem does this solve (for users of the array API standard)?" For me "users of the standard" are both people creating array providing libraries like Numpy or array-api-strict, as well as people using such a library to build something else (e.g. scikit-learn). |
Just to be clear, the primary function of these "stubs" is so that they can be automatically included in the standard via Sphinx autodoc. We used to just have everything in RST files, but using autodoc is nicer because it lets us write the signatures in pure Python. It also allows people to just copy-paste a signature when implementing a function (e.g., I do this all the time when adding something to array-api-strict or array-api-compat). The test suite also uses this package to automatically generate some tests, although that's something that can be refactored concurrently. Reusing them for typing is probably fine, as long as that particular usage is maintained. This module isn't really ever "imported" or anything. This package could be refactored into something installable from PyPI (this was discussed previously at #472). It's not clear to me how that would or should work since it's not a runnable package. And there's also complexity there since there are multiple versions of the standard. |
@@ -147,3 +145,1229 @@ def dtypes( | |||
"max rank": Optional[int], | |||
}, | |||
) | |||
|
|||
|
|||
class Array(Protocol[array, "dtype", "device", PyCapsule]): # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a good reason to move this from array_object.py.
Also, since this is primarily used for documentation, the other thing we need to make sure is that the type signatures always remain readable. That means that we should always spell out types exactly (e.g., it would not be preferred to split out common union types into variables since those would be opaque in the documentation), and ideally the types used in the signatures should always match the types spelled out in the docstring. |
@@ -90,7 +88,7 @@ def __len__(self, /) -> int: | |||
... | |||
|
|||
|
|||
class Info(Protocol): | |||
class Info(Protocol[device]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what the [device]
here means after Protocol? The Info namespace object itself does not depend on the device (__array_namespace_info__
does not take a device
parameter).
Then it's probably a good idea to conform to the official typing specification, and validate this using static type checkers like basedpyright (a stricter pyright fork) and basedmypy (a mypy fork that is less broken)
In an ideal scenario, it could be used like this: # roughly based on the `scipy.fft._helper` stubs in `scipy-stubs`:
# https://github.com/jorenham/scipy-stubs/blob/master/scipy-stubs/fft/_helper.pyi
from typing import Unpack, overload
from array_api.typing.v2023 as xpt
### utility type aliases
# generic dtype protocol (requires a non-trivial spec), with an
# (invariant) type parameter for its "kind"
type NumericDType = xpt.DType[int] | xpt.DType[float] | xpt.DType[complex]
# shape-types are integer tuples, which is what numpy currently
# uses to annotate the shape-type of `ndarray`
type Size = int # positive integer
type AtLeast0D = tuple[Size, ...]
type AtLeast1D = tuple[Size, *AtLeast0D] # rejects `()`
# runtime-checkable generic array protocol, so that instances
# of e.g. `numpy.ndarray` are assignable to it
type NumericTensor[DeviceT: xpt.Device = xpt.Device] = xpt.Array[
AtLeast1D, # shape-type argument
NumericDType, # dtype-type argument
DeviceT, # device-type _parameter_ (optional)
]
### public signatures
@overload
def fftshift[T: NumericTensor](x: T, axes: Axes | None = None) -> T: ...
# <four other overloads for numpy array-likes>
# currently impossible; requires higher-kinded-typing
# https://github.com/python/typing/issues/548
@overload
def fftfreq[
SizeT: Size,
ArrayT: xpt.Array,
DTypeT: xpt.DType,
DeviceT: xpt.DeviceT,
](
n: SizeT,
d: float,
*,
# obtain the array- dtype-, and device-types by matching
# against the generic array-namespace Protocol, and binding
# to its (hypothetically generic) type-parameters
xp: xpt.Namespace[ArrayT, DTypeT, DeviceT],
device: DeviceT,
) -> ArrayT[ # the array-type from `xp`, that is
tuple[SizeT], # 1-dimensional and of size `n`,
DTypeT[float], # with a real floating-point dtype, and
DeviceT[None], # allocated on the default device
]: ...
# <two other overloads for the default `xp=None` case> caution: completely off-topic rant ahead So you could probably already tell; but I've actually been thinking quite a lot about this 😅. I've been planning on building something like this in But unfortunately, my first attempt at building this has kinda failed, mostly because I didn't realize that the current spec doesn't include a way to distinguish the different kinds of dtypes from one another using static typing. So there's no way to type something like the Either way, I'll probably give it another go in a couple of months or so. And if this time it'll actually succeed, I'll make sure to open an issue or PR so that we can figure out the next steps. TLDR-ish So I guess I'm trying to say that typing the array-api is very difficult, and that I think it would help if the spec itself would be more aware of the static typing challenges. So in that sense, these "stubs" might actually be a very good way to go achieve this. For example by trying to use these "stubs" to annotate a simple toy example (like the one in my example), and in such a way that static type-checkers understand it. I realize that this might come over as if I'm trying to make this into some school exercise or something. But the reason I'm suggesting this, is because there have been way too many times where I thought that I understood some python-typing concept after reading about it, followed by a realization that I completely misunderstood it (often after introducing several bugs along the way) 😅. |
We should consider whether it would more sense to put the installable typing stuff in array-api-strict (or even in a completely new separate package). That way it remains separate from the "stubs" in the standard, which, as I noted, exist primarily for documentation. It would imply some manual copying from the standard into whatever package, but that is already what we have been doing to create array-api-strict and array-api-compat (and even the test suite), and it has worked reasonably well. The advantage of keeping them separate is we wouldn't have to worry about potential conflicts between typing correctness and readability of the stubs as standard documentation, or potential issues that might arise with Sphinx. It also keeps the stubs here very simple (just functions with basic docstrings), and keeps the code complexity implied by typing stuff elsewhere. It would also mean that the (English) text of the standard remains the single source of truth about what is specified. If the Python typing stubs end up disagreeing with that somehow, either because of a bug or because Python typing doesn't support some feature, the standard would be the thing that is correct. I think this is important because if there are two sources of truth (the text and typing "implementation" of the standard), there should be one final one in the case of any ambiguity, and I strongly feel that the final source of truth should be English text, not some reference code. Having it in a separate package would also allow people who are the experts wrt typing to be able to maintain that package more effectively, without necessarily having to have their changes go through the review process of this repo, which tends to be a little more stringent/slower. |
I totally agree with you on that @asmeurer. And having independent release cycles seems also like a good idea in this case 😛. |
Oh and while we're still off-topic: |
👍 to this. We should not craft the spec in a certain way just because we can't write it down with Python's static type system. |
Not necessarily, no. But it could help to actively take it into consideration when making design decisions. |
Closes #856