Replies: 1 comment
-
FWIW, an alternative version of this that avoids the mypy dynamic class issue brought up in #433 would look something like: from typing import List, Optional, Tuple
import pandas as pd
import pandera as pa
from pandera.typing import DateTime, Series
class PrimaryKeyMixin(pa.SchemaModel):
__primary_key__: Tuple[str, ...] = ()
def __init_subclass__(cls, primary_key: Optional[List[str]] = None, **kwargs):
super().__init_subclass__(**kwargs)
cls.__primary_key__ = tuple(primary_key or [])
@pa.dataframe_check
@classmethod
def check_primary_key(cls, df: pd.DataFrame) -> bool:
return (df.groupby(list(cls.__primary_key__)).size() <= 1).all()
class Transaction(PrimaryKeyMixin, primary_key=['user_id', 'occurred_at']):
user_id: Series[int]
occurred_at: Series[DateTime]
value: Series[float] |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Following on from the discussion in #433 and #386. Mixins are quite common in the Python ecosystem, e.g., in werkzeug and SQLAlchemy. With the new
SchemaModel
syntax made available in pandera 0.5.x, mixins may make sense to formally bring into the pandera ecosystem either as recipes provided in documentation or as explicit utility classes provided in the package itself.I think the most natural place they'd fit into the pandera workflow would be in constraint specification, especially wide constraints. For example, in tidy data, it is common that every table have a primary key. This is a collection of columns the values of which occur only once in a given table and, effectively, serve as a unique identifier for the row. Currently specifying this constraint in pandera looks something like this:
But this could be pulled out into a mixin, e.g.,
Beta Was this translation helpful? Give feedback.
All reactions