Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Visualize Data Transformations Between Pydantic Models #129

Open
lucas-nelson-uiuc opened this issue Oct 13, 2024 · 1 comment

Comments

@lucas-nelson-uiuc
Copy link

lucas-nelson-uiuc commented Oct 13, 2024

Hey! Just found out about erdantic through Python Bytes. Looks great and love messing with it so far.

I work with a lot of Pydantic models to facilitate PySpark transformations - given a model, you can read, transform, and validate a raw file or loaded DataFrame against a model. Since it's built on Pydantic, it allows some nice features (nesting models, ease of documentation, etc.) and encourages declarative/composable pipelines.

However, instead of composing fields as collections of other models, I convert data from one model to the next. Most examples look like this:

import datetime
import decimal

from pydantic import BaseModel, Field
from pyspark.sql import functions as F


# describe raw data as model to facilitate read and preprocessing steps
class RawFinancialStatement(BaseModel):
    acct: str = Field(pattern=r"\d{5}")
    descr: str
    posted: datetime.date = Field(
        ge=datetime.date(2024, 1, 1), le=datetime.date(2024, 12, 31)
    )
    amount: decimal.Decimal

# for all files, read-in using model's schema, union together, then transform and validate against model
raw_data = RawFinancialStatement.read(
    source=["path/to/file.csv", "path/to/another_file.csv"]
)


# convert intermediate model to expected model for analytical workflows
class CommonFinancialStatement(BaseModel):  # or inherits from a defined business model
    account_number: str = Field(alias="acct")
    account_description: str = Field(alias="descr")
    date_effective: datetime.date = Field(alias="posted")
    date_posted: datetime.date = Field(alias="posted")
    net_amount: decimal.Decimal = Field(alias="amount")
    user_posted: str = Field(
        default=F.when(F.col("acct").startswith("A"), "USER1").otherwise("USER2")
    )

# transform and validate data against model
processed_data = CommonFinancialStatement.transform(data=raw_data).validate()

Using erdantic, would it be possible to construct an ER diagram between multiple models that simply describe how data is transformed? Please let me know if I need to explain my use case some more - thank you!

@lucas-nelson-uiuc lucas-nelson-uiuc changed the title Using erdantic to Display Custom Data Transformations Feature Request: Using erdantic to Display Custom Data Transformations Oct 13, 2024
@lucas-nelson-uiuc lucas-nelson-uiuc changed the title Feature Request: Using erdantic to Display Custom Data Transformations Feature Request: Visualize Data Transformations Between Pydantic Models Oct 13, 2024
@jayqi
Copy link
Member

jayqi commented Oct 14, 2024

Hi @lucas-nelson-uiuc,

Thanks for trying out erdantic!

I'd like to better understand your use case. Some questions here:

  • Where does the CommonFinancialStatement.transform come from? Is this a custom factory method on the CommonFinancialStatement that you've written?
  • Where is there metadata that explicitly links the RawFinancialStatement and CommonFinancialStatement models? As a practical consideration, we need this metadata in order to know the relationship between these two models in order to build the diagram.
  • Can you sketch what you think this diagram would look like?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants