Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rowexpander requires key to be set in return dict #70

Open
fromm1990 opened this issue Oct 1, 2024 · 0 comments
Open

Rowexpander requires key to be set in return dict #70

fromm1990 opened this issue Oct 1, 2024 · 0 comments

Comments

@fromm1990
Copy link
Contributor

When using the key as part of the lookupatts, the rowexpander expects the key column to be included in the return dict.
If the key is not included, duplicate rows will be created. Having keys as part of the lookupatts is not uncommen in a smartkey usecase.

The following code results in incrementing date_ids:

CREATE TABLE date_dim(
    date_id    INT         PRIMARY KEY,
    year_no    SMALLINT    NOT NULL,
    month_no   SMALLINT    NOT NULL,
    day_no     SMALLINT    NOT NULL,
    week_no    SMALLINT    NOT NULL,
    weekday_no SMALLINT    NOT NULL,
    quarter_no SMALLINT    NOT NULL,
    iso_date   VARCHAR(10) NOT NULL,
    month_name VARCHAR(9)  NOT NULL
);
from pygrametl import ConnectionWrapper
from typing import Any
from datetime import datetime
from pygrametl.tables import CachedDimension

cw = ConnectionWrapper(...)
key = "date_id"

def expand(row: dict[str, Any], name_map: dict[str, str]):
    col = name_map.get(key, key)
    date_key = row[col]
    dt = datetime.strptime(str(date_key), "%Y%m%d")
    calendar = dt.isocalendar()
    return {
        "year_no": dt.year,
        "month_no": dt.month,
        "day_no": dt.day,
        "quarter_no": (dt.month // 4) + 1,
        "iso_date": dt.strftime("%Y-%m-%d"),
        "month_name": dt.strftime("%B"),
        "week_no": calendar.week,
        "weekday_no": calendar.weekday,
    }


dim = CachedDimension(
    name="date_dim",
    key=key,
    attributes=[
        "year_no",
        "month_no",
        "day_no",
        "week_no",
        "weekday_no",
        "quarter_no",
        "iso_date",
        "month_name",
    ],
    lookupatts=[key],
    targetconnection=cw,
    rowexpander=expand,
)

rows = [
    {"date_id": 20241001},
    {"date_id": 20241001},
    {"date_id": 20241001},
    {"date_id": 20241001},
    {"date_id": 20241001},
]


for row in rows:
    dim.ensure(row)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant