Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: cast NULLs in VALUES clause (df->values) to avoid coercion issues #2856

Merged
merged 1 commit into from
Jul 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions sqlmesh/core/dialect.py
Original file line number Diff line number Diff line change
Expand Up @@ -897,6 +897,14 @@ def select_from_values_for_batch_range(
]

values_exp = exp.values(expressions, alias=alias, columns=columns_to_types)
if values:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we cast values, directly, do we need to also cast the select?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you referring to the projections? because we already cast them earlier in this function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea but now it's a double cast, cast in the values, and cast in the projection, should we just do one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah gotcha.. I guess we can try? I was being conservative because I'm not sure what the implications are in case we remove those casts, maybe we'll fail to infer the type of a projection down the line somewhere? On the other hand, we seem to have the columns_to_types mapping already, so perhaps it's fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no big deal

# BigQuery crashes on `SELECT CAST(x AS TIMESTAMP) FROM UNNEST([NULL]) AS x`, but not
# on `SELECT CAST(x AS TIMESTAMP) FROM UNNEST([CAST(NULL AS TIMESTAMP)]) AS x`. This
# ensures nulls under the `Values` expression are cast to avoid similar issues.
for value, kind in zip(values_exp.expressions[0].expressions, columns_to_types.values()):
if isinstance(value, exp.Null):
value.replace(exp.cast(value, to=kind))

return exp.select(*casted_columns).from_(values_exp, copy=False).where(where, copy=False)


Expand Down
14 changes: 14 additions & 0 deletions tests/core/test_dialect.py
Original file line number Diff line number Diff line change
Expand Up @@ -375,6 +375,20 @@ def test_select_from_values_for_batch_range_json():
)


def test_select_from_values_that_include_null():
values = [(1, exp.null())]
columns_to_types = {
"id": exp.DataType.build("int", dialect="bigquery"),
"ts": exp.DataType.build("timestamp", dialect="bigquery"),
}

values_expr = select_from_values_for_batch_range(values, columns_to_types, 0, len(values))
assert values_expr.sql(dialect="bigquery") == (
"SELECT CAST(id AS INT64) AS id, CAST(ts AS TIMESTAMP) AS ts FROM "
"UNNEST([STRUCT(1 AS id, CAST(NULL AS TIMESTAMP) AS ts)]) AS t"
)


@pytest.fixture(params=["mysql", "duckdb", "postgres", "snowflake"])
def normalization_dialect(request):
if request.param == "duckdb":
Expand Down