You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
Currently, microbatch timestamps are being recognized as TIMESTAMP_NTZ in Snowflake, instead of TIMESTAMP_TZ. As a result, if user's system timezone is in anything but UTC, Snowflake will offset the timezone, resulting in incorrect and duplicative rows loaded.
For example, if you ran a microbatch model with the --event-end-time of today (2024-11-05) it would produce the following code:
and [timestamp_field] < TIMESTAMP '2024-11-05 00:00:00+00:00'
When you look at this same timestamp in Snowflake you can see it's stored as timestamp_ntz:
When you convert this to a timestamp_tz then you'll see the system offset will takeover and adjust accordingly:
According to the docs, microbatch expects input data to be in UTC. Therefore, if your source column is of type timestamp_tz and in UTC, the two timestamps would be off by 8 hours in the above example - causing duplicative rows being inserted.
Instead of using the invalid TIMESTAMP function (which I believe is the cause for data being stored as UTC) - such as the below:
TIMESTAMP '2024-11-05 00:00:00+00:00'
dbt Core should instead use the to_timestamp_tz function that correctly stores this information as a timestamp_tz with the correct UTC offset and data type:
Expected Behavior
Microbatch DDL statements should be operating off of timestamp_tz data types with a UTC offset of 0. Running a model with a microbatch materialization over and over again should not duplicate rows of data in the resulting dataset.
Insert/Delete statements should be operating off of the same datatype & UTC offset as source data.
Steps To Reproduce
Connect dbt Cloud to a Snowflake project
Ensure the Snowflake system timezone has a timezone offset broadly applied.
Create a source model with event_time configured on a timestamp_tz column in UTC.
Run the initial microbatch model from start to end (begins_on through now).
Re-run the microbatch model using a lookback of 1
Observe duplicate rows in the resulting data.
Relevant log output
No response
Environment
- Python: 3.8.12
- dbt: dbt Cloud CLI - 0.38.21
Which database adapter are you using with dbt?
snowflake
Additional Context
No response
The text was updated successfully, but these errors were encountered:
Is this a new bug in dbt-core?
Current Behavior
Currently, microbatch timestamps are being recognized as
TIMESTAMP_NTZ
in Snowflake, instead ofTIMESTAMP_TZ
. As a result, if user's system timezone is in anything but UTC, Snowflake will offset the timezone, resulting in incorrect and duplicative rows loaded.For example, if you ran a microbatch model with the
--event-end-time
of today (2024-11-05) it would produce the following code:When you look at this same timestamp in Snowflake you can see it's stored as
timestamp_ntz
:When you convert this to a
timestamp_tz
then you'll see the system offset will takeover and adjust accordingly:According to the docs, microbatch expects input data to be in UTC. Therefore, if your source column is of type
timestamp_tz
and in UTC, the two timestamps would be off by 8 hours in the above example - causing duplicative rows being inserted.Instead of using the invalid
TIMESTAMP
function (which I believe is the cause for data being stored as UTC) - such as the below:dbt Core should instead use the
to_timestamp_tz
function that correctly stores this information as atimestamp_tz
with the correct UTC offset and data type:Expected Behavior
Microbatch DDL statements should be operating off of
timestamp_tz
data types with a UTC offset of 0. Running a model with a microbatch materialization over and over again should not duplicate rows of data in the resulting dataset.Insert/Delete statements should be operating off of the same datatype & UTC offset as source data.
Steps To Reproduce
event_time
configured on atimestamp_tz
column in UTC.Relevant log output
No response
Environment
Which database adapter are you using with dbt?
snowflake
Additional Context
No response
The text was updated successfully, but these errors were encountered: