Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pyright: operator not supported for NDArray #41

Open
swarn opened this issue Jan 2, 2025 · 6 comments
Open

pyright: operator not supported for NDArray #41

swarn opened this issue Jan 2, 2025 · 6 comments

Comments

@swarn
Copy link

swarn commented Jan 2, 2025

Given this code:

# pyright: reportInvalidTypeArguments=false

from numpydantic import NDArray, Shape


def f(
    x: NDArray[Shape["2"], float],  # noqa: F722
    y: NDArray[Shape["2"], float],  # noqa: F722
):
    return x + y

Pyright warns for the return x + y statement:

Diagnostics:
1. Operator "+" not supported for types "NDArray" and "NDArray"
     Operator "+" not supported for types "H5ArrayPath" and "H5Proxy"
     Operator "+" not supported for types "Tuple[Path | str, str]" and "H5Proxy"
     Operator "+" not supported for types "H5Proxy" and "H5ArrayPath"
     Operator "+" not supported for types "H5Proxy" and "Tuple[Path | str, str]"
     Operator "+" not supported for types "H5Proxy" and "H5Proxy" [reportOperatorIssue]

Have I missed some obvious documentation?

numpydantic 1.6.6
pyright 1.1.391
pydantic 2.9.2
python 3.12.6

@swarn
Copy link
Author

swarn commented Jan 3, 2025

Ah, I see here that:

This class is not intended to be instantiated or used for type checking

I will post back here with a simple example (once I've figured it out) in case someone else with the same misunderstanding comes along.

@swarn
Copy link
Author

swarn commented Jan 3, 2025

And further reading leads me to understand this is a planned feature for the future.

@swarn swarn closed this as completed Jan 3, 2025
@swarn
Copy link
Author

swarn commented Jan 3, 2025

Sorry for the noise — spinning my gears a bit here.

It seemed like this should be easy to fix with Annotated:

from typing import Annotated

import numpy.typing as npt
import numpydantic as npd
from pydantic import BaseModel

class TwoArrays(BaseModel):
    x: Annotated[npt.NDArray, ...]
    y: Annotated[npt.NDArray, ...]

    def f(self):
        return self.x.T + self.y.T

This satisfies type checkers, which treat x and y as numpy arrays. But, I can't for the life of me figure out how to add npd.NDArray to the Annotated arguments to take let pydantic take advantage of the serialization and validation implemented in numpydantic.

@swarn swarn reopened this Jan 3, 2025
@sneakers-the-rat
Copy link
Collaborator

Apologies for the delay! was on holiday and just catching up on issues.

yes! so there is a bit of awkwardness still before the planned 2.0 switch to using typing.Protocol and eliminating the old nptyping-style syntax in favor of types that take advantage of the python typing system more fully - using nptyping was mostly a bridge that we were using because it existed already but as this project split off into its own package it became clear that it needed to go for a number of reasons.

The basic problem is that we want to allow a bunch of different types of input to support a bunch of different types of arrays, but pydantic doesn't have an elegant way of handling differences between the input type - what is passed on creation - and the field type - the main thing that pydantic annotations specify, the thing that is actually contained by the model instance.

So for example we want to support being able to use arrays on disk, or in formats like videos or images like this:

class MyVideo(BaseModel):
    video: NDArray[Shape["1920, 1080, 3"], np.uint8]
    
instance = MyVideo(video=Path("./my_video_file.mp4"))

Pydantic will complain like "hey that's a string! that's not an array at all!" even if we convert it to an arraylike proxy class, so numpydantic tries to hack around that by autogenerating the .pyi stub file using the input_type of each of the interfaces. So that's where the original error is coming from - those are all the possible types that the NDArray object could be.

Re: support for +, that would be super simple to add. There is already precedent here with __len__, where this test enforces that all interfaces must implement it:

def test_dunder_len(interface_cases, tmp_output_dir_func):

so we would do something similar with __add__, because it should be the case that all the interfaces do in fact support it, it's just not implemented on some of the proxy classes explicitly - they actually should be able to do it because they implement a __getattr__ that forwards method lookups to the array types they proxy for (e.g. for hdf5)

I'm planning on getting the 2.0 work done sometime this spring, there are a few blockers upstream in pydantic but i think i understand how it works enough to monkeypatch them here (although pydantic is a bit of a moving target).


Another thing that might be useful here is to provide specific annotations for specific interfaces. The initial purpose for this package was to be able to specify an abstract array specification that could be provided by arbitrary array backends, and that's the main implementation challenge. But if you don't care about that and just want to use numpy arrays, then it wouldn't be too hard to provide a specific NPArray or NumpyArray type annotation that was specific to the numpy interface.

The NDArray class is really just a very thin wrapper to forward validation onto a matching interface, provide a serializer, and generate json schema:

get_validate_interface(shape, dtype),
serialization=core_schema.plain_serializer_function_ser_schema(
jsonize_array, when_used="json", info_arg=True
),
metadata=json_schema,

So you could do something like this:

# ...
from numpydantic.interface import NumpyInterface
# ...

class NumpyArray(NPTypingType, metaclass=NDArrayMeta):
    @classmethod
    def __get_pydantic_core_schema__(
        cls,
        _source_type: npt.NDArray,
        _handler: "CallbackGetCoreSchemaHandler",
    ) -> core_schema.CoreSchema:

        # ...
        interface = NumpyInterface(shape, dtype)
        # ...

        return core_schema.with_info_plain_validator_function(
            interface.validate,
            serialization=core_schema.plain_serializer_function_ser_schema(
                jsonize_array, when_used="json", info_arg=True
            ),
            metadata=json_schema,
        )
        

with a .pyi stub that indicates that NumpyArray is a npt.NDArray. Pretty inelegant but that's just the constraints of not being able to use the Annotated yet with all the features we want (though see below) and so on. The ugliest part is the stuff that we inherit from nptyping like that metaclass business and all the rest - like i said we're eager to be rid of it ;P


The way you could do this on your own with Annotated would be to do something like this:

from typing import Annotated

import numpy.typing as npt
from numpydantic import NDArray
from pydantic import BaseModel, GetPydanticSchema


class MyModel(BaseModel)
    array: Annotated[npt.NDArray, GetPydanticSchema(NDArray.__get_pydantic_core_schema__)]

haven't tested it yet but that should work and is the direction that we're going for 2.0 anyway. That won't have the JSON Schema with it because of some quirks in how pydantic handles json schema generation we had to hack around (it's split off into a separate method, and when they are passed as annotation callables like this they behave differently than when defined with __get_pydantic_core_schema__ and __get_pydantic_json_schema__

PRs welcome on all this if you get it to work <3 lmk if you have further troubles

@swarn
Copy link
Author

swarn commented Jan 8, 2025

Wow! Thanks for this fantastic response!

That last section is exactly what I was trying to figure out. I think it's not quite there:

from typing import Annotated

import numpy as np
import numpy.typing as npt
from numpydantic import NDArray
from pydantic import BaseModel, GetPydanticSchema

type Array = Annotated[npt.NDArray, GetPydanticSchema(NDArray.__get_pydantic_core_schema__)]

class TwoArrays(BaseModel):
    x: Array
    y: Array

    def f(self) -> Array:
        return self.x.T + self.y.T

Throws an exception at class definition:

[... many lines cut ...]
  File "/Users/seth/.local/share/conda/envs/py312/lib/python3.12/site-packages/numpydantic/schema.py", line 83, in _lol_dtype
    elif issubclass(dtype, BaseModel):
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen abc>", line 123, in __subclasscheck__
TypeError: issubclass() arg 1 must be a class

In the meantime I hacked together a minimal serialization module that is fragile, has few features, and is hacky, but does 1) handle dataclasses with ndarray fields and 2) uses tagged unions for abstract types. You can tell how fragile it is by one function:

def get_actual_type(obj: Any) -> Any:
    # Strip off all type aliases.
    while isinstance(obj, TypeAliasType):
        obj = obj.__value__

    # Go from a generic type to the underlying type.
    if isinstance(obj, GenericAlias):
        return get_origin(obj)

    return obj

Which will fail for Annotations, and to get the class for those, you can't use get_origin, it's... something else I forget right now. Your explanation above and my own little effort have revealed a lot to me about Python's annotation system, and I have enormous respect for what you've done here :)

I imagine I'll throw my code away and switch once you've got 2.0 running. My feeling is that the whole python typing ecosystem is moving very fast right now; I'm hesitant to spend much more effort on it while things are still in flux. E.g., pyright is OK with NDArray, but basedpyright complains about missing type arguments for the generic type.

@swarn
Copy link
Author

swarn commented Jan 8, 2025

... pyright is OK with, but ...

I was wrong; they'll both flag it, it's just they have different defaults.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants