-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pydantic for data validation? #502
Comments
Is it common for model parameters to be incorrectly specified? If not I think |
Actually I think you can add custom checks across the fields (eg the data frame). Look in the example I shared above I have something like from pydantic import BaseModel, Field, field_validator
class Region(BaseModel):
id: int = Field(..., ge=0)
stores: list[Store] = Field(..., min_items=1)
median_income: float = Field(..., gt=0)
@field_validator("stores")
def validate_store_ids(cls, value):
if len({store.id for store in value}) != len(value):
raise ValueError("stores must have unique ids")
return value
def to_dataframe(self) -> pd.DataFrame:
df = pd.concat([store.to_dataframe() for store in self.stores], axis=0)
df["region_id"] = self.id
df["median_income"] = self.median_income
return df.reset_index(drop=True) Which is a custom check :) |
I did look at it, and abandoned editing my previous post when you replied haha. I've used On a related note, I created an issue to add a data validation utility method to the CLV module for users who provide their own RFM data, but I have other priorities at the moment. |
I also agree with this in general. I think I think the problem we want to solve is to have a unified way for data and parameter validation. There is nothing wrong on how we are using it now, it is more about a nicer API. Still, I do not have a very strong option. I will investigate more and see if Thanks for the feedback :) |
Pandera is great, it is as actively developed as pydantic |
Where does this issue stand? @juanitorduz |
we can close this one IMO |
At the end of #498, we touched on a point it was been on my mind for a while now.
Shall we use
pydantic
for data validation?I have worked with Pydantic on many projects, and I love it! It is super fast and actively maintained! See for example the data generation process in https://juanitorduz.github.io/multilevel_elasticities_single_sku/
This would provide a modern and elegant way to validate data (input data and parameters). If we agree on doing it I would be happy to kick-off this initiative 😄 .
The text was updated successfully, but these errors were encountered: