[Bug] Microbatch timestamps incorrectly offset for system timezone in Snowflake #10976

bthomson22 · 2024-11-05T20:31:26Z

Is this a new bug in dbt-core?

I believe this is a new bug in dbt-core
I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Currently, microbatch timestamps are being recognized as TIMESTAMP_NTZ in Snowflake, instead of TIMESTAMP_TZ. As a result, if user's system timezone is in anything but UTC, Snowflake will offset the timezone, resulting in incorrect and duplicative rows loaded.

For example, if you ran a microbatch model with the --event-end-time of today (2024-11-05) it would produce the following code:

and [timestamp_field] < TIMESTAMP '2024-11-05 00:00:00+00:00'

When you look at this same timestamp in Snowflake you can see it's stored as timestamp_ntz:

When you convert this to a timestamp_tz then you'll see the system offset will takeover and adjust accordingly:

According to the docs, microbatch expects input data to be in UTC. Therefore, if your source column is of type timestamp_tz and in UTC, the two timestamps would be off by 8 hours in the above example - causing duplicative rows being inserted.

Instead of using the invalid TIMESTAMP function (which I believe is the cause for data being stored as UTC) - such as the below:

TIMESTAMP '2024-11-05 00:00:00+00:00'

dbt Core should instead use the to_timestamp_tz function that correctly stores this information as a timestamp_tz with the correct UTC offset and data type:

Expected Behavior

Microbatch DDL statements should be operating off of timestamp_tz data types with a UTC offset of 0. Running a model with a microbatch materialization over and over again should not duplicate rows of data in the resulting dataset.

Insert/Delete statements should be operating off of the same datatype & UTC offset as source data.

Steps To Reproduce

Connect dbt Cloud to a Snowflake project
Ensure the Snowflake system timezone has a timezone offset broadly applied.
Create a source model with event_time configured on a timestamp_tz column in UTC.
Run the initial microbatch model from start to end (begins_on through now).
Re-run the microbatch model using a lookback of 1
Observe duplicate rows in the resulting data.

Relevant log output

No response

Environment

- Python: 3.8.12
- dbt: dbt Cloud CLI - 0.38.21

Which database adapter are you using with dbt?

snowflake

Additional Context

No response

The text was updated successfully, but these errors were encountered:

bthomson22 added bug Something isn't working triage labels Nov 5, 2024

graciegoheen mentioned this issue Nov 5, 2024

[EPIC] Incremental Model Improvements - Microbatch #10624

Open

dbeatty10 added the microbatch label Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Microbatch timestamps incorrectly offset for system timezone in Snowflake #10976

[Bug] Microbatch timestamps incorrectly offset for system timezone in Snowflake #10976

bthomson22 commented Nov 5, 2024

[Bug] Microbatch timestamps incorrectly offset for system timezone in Snowflake #10976

[Bug] Microbatch timestamps incorrectly offset for system timezone in Snowflake #10976

Comments

bthomson22 commented Nov 5, 2024

Is this a new bug in dbt-core?

Current Behavior

Expected Behavior

Steps To Reproduce

Relevant log output

Environment

Which database adapter are you using with dbt?

Additional Context