-
-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend SchemaModel so that it's a valid type in a pydantic model #453
Comments
That said, we need to use the same annotation as the pandera.check_types decorator, e.g . Example of the syntax: import pandera as pa
from pandera.typing import Index, DataFrame
class InputSchema(pa.SchemaModel):
year: Series[int]
class OutputSchema(InputSchema):
revenue: Series[float]
@pa.check_types
def transform(df: DataFrame[InputSchema]) -> DataFrame[OutputSchema]:
return df.assign(revenue=100.0) We'd need to add Would you be up to submit a pull request? |
sounds good to me, thanks @tfwillems ! |
@jeffzi @cosmicBboy I started working on a simple prototype for this that would replace DataFrame in pandera.typing but unfortunately ran into an issue. Currently, pydantic supports Generic classes as types, but if type argument(s) are supplied, it seems that they must also implement Here's how I envisioned the prototype could potentially function: import inspect
import pandas as pd
import pandera as pa
import pydantic
from typing import Generic, TypeVar, Tuple
Schema = TypeVar("Schema", bound="SchemaModel") # type: ignore
class DataFrame(pd.DataFrame, Generic[Schema]):
"""
Representation of pandas.DataFrame, only used for type annotation.
*new in 0.5.0*
"""
@classmethod
def __get_validators__(cls):
yield cls.validate
@classmethod
def validate(cls, v, field: pydantic.fields.ModelField):
if not isinstance(v, pd.DataFrame):
raise ValueError(f"Expected an instance of pandas.DataFrame but found an instance of '{type(v)}'")
# Generic parameter was not provided so don't validate schema
if field.sub_fields is None:
return v
# Verify that the generic argument is a subclass of SchemaModel
generic_type = field.sub_fields[0].outer_type_
if not inspect.isclass(generic_type) or not issubclass(generic_type, pa.SchemaModel):
raise TypeError(
f"Invalid generic argument for {cls.__name__}."
f" Expected a subclass of SchemaModel but found '{generic_type}'"
)
try:
return generic_type.validate(df)
except pa.errors.SchemaError as e:
raise ValueError(str(e)) But unfortunately when I test it as below, I receive a slew of pydantic errors that ultimately suggest you need to provide a from pydantic import BaseModel
df_1 = pd.DataFrame({"x": [1,2,3]})
class MySchemaModel(pa.SchemaModel):
x: pa.typing.Series[int] = pa.Field()
class MyModel(BaseModel):
x: DataFrame[MySchemaModel]
Traceback (most recent call last):
File "/cluster/home/willems/MethodsDev/pandera-extensions/pandera/pandera/testing123.py", line 74, in <module>
class MyModel(BaseModel):
File "pydantic/main.py", line 293, in pydantic.main.ModelMetaclass.__new__
File "pydantic/fields.py", line 410, in pydantic.fields.ModelField.infer
File "pydantic/fields.py", line 342, in pydantic.fields.ModelField.__init__
File "pydantic/fields.py", line 450, in pydantic.fields.ModelField.prepare
File "pydantic/fields.py", line 622, in pydantic.fields.ModelField._type_analysis
File "pydantic/fields.py", line 648, in pydantic.fields.ModelField._create_sub_type
File "pydantic/fields.py", line 342, in pydantic.fields.ModelField.__init__
File "pydantic/fields.py", line 456, in pydantic.fields.ModelField.prepare
File "pydantic/fields.py", line 670, in pydantic.fields.ModelField.populate_validators
File "pydantic/validators.py", line 715, in find_validators
RuntimeError: no validator found for <class '__main__.MySchemaModel'>, see `arbitrary_types_allowed` in Config @jeffzi To verify this wasn't solely due to an obvious implementation issue on my part, I reproduced this using the generic class example in the pydantic docs involving AgedType and QualityType. from pydantic import BaseModel, ValidationError
from pydantic.fields import ModelField
from typing import TypeVar, Generic
AgedType = TypeVar('AgedType')
QualityType = TypeVar('QualityType')
# This is not a pydantic model, it's an arbitrary generic class
class TastingModel(Generic[AgedType, QualityType]):
def __init__(self, name: str, aged: AgedType, quality: QualityType):
self.name = name
self.aged = aged
self.quality = quality
@classmethod
def __get_validators__(cls):
yield cls.validate
@classmethod
# You don't need to add the "ModelField", but it will help your
# editor give you completion and catch errors
def validate(cls, v, field: ModelField):
if not isinstance(v, cls):
# The value is not even a TastingModel
raise TypeError('Invalid value')
if not field.sub_fields:
# Generic parameters were not provided so we don't try to validate
# them and just return the value as is
return v
aged_f = field.sub_fields[0]
quality_f = field.sub_fields[1]
errors = []
# Here we don't need the validated value, but we want the errors
valid_value, error = aged_f.validate(v.aged, {}, loc='aged')
if error:
errors.append(error)
# Here we don't need the validated value, but we want the errors
valid_value, error = quality_f.validate(v.quality, {}, loc='quality')
if error:
errors.append(error)
if errors:
raise ValidationError(errors, cls)
# Validation passed without errors, return the same instance received
return v
# My code starts here
class OkModel(BaseModel):
x: TastingModel[int, float]
class CustomAgedType:
pass
class FailModel(BaseModel):
x: TastingModel[CustomAgedType, float]
Traceback (most recent call last):
File "/cluster/home/willems/MethodsDev/pandera-extensions/pandera/pandera/testing123.py", line 137, in <module>
class FailModel(BaseModel):
File "pydantic/main.py", line 293, in pydantic.main.ModelMetaclass.__new__
File "pydantic/fields.py", line 410, in pydantic.fields.ModelField.infer
File "pydantic/fields.py", line 342, in pydantic.fields.ModelField.__init__
File "pydantic/fields.py", line 450, in pydantic.fields.ModelField.prepare
File "pydantic/fields.py", line 622, in pydantic.fields.ModelField._type_analysis
File "pydantic/fields.py", line 648, in pydantic.fields.ModelField._create_sub_type
File "pydantic/fields.py", line 342, in pydantic.fields.ModelField.__init__
File "pydantic/fields.py", line 456, in pydantic.fields.ModelField.prepare
File "pydantic/fields.py", line 670, in pydantic.fields.ModelField.populate_validators
File "pydantic/validators.py", line 715, in find_validators
RuntimeError: no validator found for <class '__main__.CustomAgedType'>, see `arbitrary_types_allowed` in Config This leaves use with a few options:
Sorry for the long-winded post but would love to hear your thoughts! |
Pydantic wants to "recursively" validate attributes, which makes sense in most cases. I agree 3. is the way to go. In our case, we could attempt a Just a detail, but I think Feel free to submit a PR, that would give you proper credit for the work you've done so far 🚀. You can assign the review to me. The contributing section can help you run the test suite/linters locally. |
Hey @tfwillems just wanted to ping you about this feature, which I think would be very useful! Let me know if you're still interested in making a PR and if there's anything we can do to help out! |
hey @tfwillems friendly ping on this issue! Let me know if you're still interested in making a PR for this, or let @jeffzi or I know if you want to pass the baton, you can point us to a dev branch/fork with your most up-to-date prototype and perhaps one of us can take it to the finish line! 🏁 |
@cosmicBboy Sorry I've made so little progress here. I don't think I have the bandwidth at the moment to make a full-fledged PR, but here is all I think is really necessary: import pandera as pa
from pandera.errors import SchemaInitError
class ExtendedSchemaModel(pa.SchemaModel):
"""Simple extension to pandera's SchemaModel that enables its use as a pydantic type"""
@classmethod
def __get_validators__(cls):
yield cls._pydantic_validate
@classmethod
def _pydantic_validate(cls, df: Any) -> pd.DataFrame:
"""Verify that the input is a pandas data frame that meets all schema requirements"""
if not isinstance(df, pd.DataFrame):
raise ValueError("Expected a pandas data frame")
try:
cls.to_schema()
except SchemaInitError as e:
raise ValueError(
f"Cannot use {cls.__name__} as a pydantic type as its SchemaModel cannot be converted to a DataFrameSchema.\n"
f"Please revisit the model to address the following errors:\n{str(e)}"
)
try:
return cls.validate(df)
except pa.errors.SchemaError as e:
raise ValueError(str(e)) Obviously the additional class is unnecessary and these functions would just be added to pa.SchemaModel's definition |
Hi @tfwillems, no worries 👍 ! Thanks for sharing your work in progress, I'll take it from there. |
This feature has been released with 0.8.0 |
Is your feature request related to a problem? Please describe.
I'd like to include pandera SchemaModels as a type within a pydantic model. pandera's SchemaModel best encapsulates a valid "type" for pandas data frames (as far as I'm aware), but it's currently missing a single classmethod that would make this possible. Ideally, we'd be able to do the following:
Describe the solution you'd like
Extend pandera's SchemaModel so that it's a valid type within pydantic. Nearly all of the heavy lifting has already been done, as the validate function essentially performs all validation. pydantic naturally supports custom types, as long as they implement the get_validators classmethod see here. We could simply extend SchemaModel by adding this classmethod or some variant thereof:
Describe alternatives you've considered
This may be outside the scope of the project, but I envision others may be interested. pydantic supports a wide range of types, but is currently missing one that naturally extends to data frames. I think adding this feature would mean that pandera naturally fills this void. Of course one option would be for me to implement my own subclass of SchemaModel that provides this classmethod, but if others might find it useful it'd be great to add it to original parent class.
The text was updated successfully, but these errors were encountered: