Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize timestamp column types in the Archival materialization #938

Closed
foundinblank opened this issue Aug 17, 2018 · 5 comments · Fixed by #1478
Closed

Standardize timestamp column types in the Archival materialization #938

foundinblank opened this issue Aug 17, 2018 · 5 comments · Fixed by #1478
Labels
snapshots Issues related to dbt's snapshot functionality

Comments

@foundinblank
Copy link

Issue

Issue description

Running dbt archive (on Snowflake) throws a cryptic error related to timezone types.

Results

16:38:45 | Concurrency: 2 threads (target='dev')
16:38:45 |
16:38:45 | 1 of 1 START archive dbt_adam.locations --> dbt_adam_archive.locations_archived [RUN]
16:38:56 | 1 of 1 ERROR archiving dbt_adam.locations --> dbt_adam_archive.locations_archived [ERROR in 11.08s]
16:38:56 |
16:38:56 | Finished running 1 archives in 14.58s.

Completed with 1 errors:

Database Error in archive locations_archived (dbt_project.yml)
  001790 (42601): 1d165133-4c2a-4587-98a4-995034f1a2e0: SQL compilation error:
  inconsistent data type for result columns for set operator input branches, expected TIMESTAMP_LTZ(9), got TIMESTAMP_NTZ(9) for expression [{2}] branch {3}

System information

The output of dbt --version:

installed version: 0.10.2
   latest version: 0.10.2

Up to date!

The operating system you're running on: OS X High Sierra 10.13.6

The python version you're using: Python 3.5.4

Steps to reproduce

Slack discussion here: https://getdbt.slack.com/archives/C2JRRQDTL/p1534508532000100

When I first got the error, I then cast my created_at field in the source table as timestamp_ltz, and verified that Snowflake returned that data type when querying the source table. It did not fix the error, though. It could be something related to the null::timestamp command. This is in the last query that failed; however, changing null:timestamp to null:timestamp_ltz, etc, doesn't fix the error.

This is the query that failed:

2018-08-17 16:38:55,632 (Thread-1): On locations_archived: create temporary table
    locations_archived__dbt_archival_tmp
  as (
    with dbt_archive_sbq as (
        

    

    with current_data as (

        select
            
                "ID" ,
            
                "UUID" ,
            
                "PERSON_ID" ,
            
                "NAME" ,
            
                "ADDRESS_ID" ,
            
                "CREATED_AT" ,
            
                "UPDATED_AT" ,
            
                "PHONE_NUMBER" ,
            
                "USED_FOR" ,
            
                "LANGUAGE_PREFERENCE" ,
            
                "DEVICE_TYPE" ,
            
                "DEFAULT_LANG_PREF" 
            ,
            updated_at as "dbt_updated_at",
            id as "dbt_pk",
            updated_at as "valid_from",
            null::TIMESTAMP as "tmp_valid_to"
        from "DBT_ADAM"."LOCATIONS"

    ),

    archived_data as (

        select
            
                "ID",
            
                "UUID",
            
                "PERSON_ID",
            
                "NAME",
            
                "ADDRESS_ID",
            
                "CREATED_AT",
            
                "UPDATED_AT",
            
                "PHONE_NUMBER",
            
                "USED_FOR",
            
                "LANGUAGE_PREFERENCE",
            
                "DEVICE_TYPE",
            
                "DEFAULT_LANG_PREF",
            
            updated_at as "dbt_updated_at",
            id as "dbt_pk",
            "valid_from",
            "valid_to" as "tmp_valid_to"
        from "DBT_ADAM_ARCHIVE"."LOCATIONS_ARCHIVED"

    ),

    insertions as (

        select
            current_data.*,
            null::TIMESTAMP as "valid_to"
        from current_data
        left outer join archived_data
          on archived_data."dbt_pk" = current_data."dbt_pk"
        where archived_data."dbt_pk" is null or (
          archived_data."dbt_pk" is not null and
          current_data."dbt_updated_at" > archived_data."dbt_updated_at" and
          archived_data."tmp_valid_to" is null
        )
    ),

    updates as (

        select
            archived_data.*,
            current_data."dbt_updated_at" as "valid_to"
        from current_data
        left outer join archived_data
          on archived_data."dbt_pk" = current_data."dbt_pk"
        where archived_data."dbt_pk" is not null
          and archived_data."dbt_updated_at" < current_data."dbt_updated_at"
          and archived_data."tmp_valid_to" is null
    ),

    merged as (

      select *, 'update' as "change_type" from updates
      union all
      select *, 'insert' as "change_type" from insertions

    )

    select *,
        
  
    md5("dbt_pk" || '|' || "dbt_updated_at")

 as "scd_id"
    from merged


      )
      select * from dbt_archive_sbq
  );
2018-08-17 16:38:56,228 (Thread-1): Snowflake error: 001790 (42601): 1d165133-4c2a-4587-98a4-995034f1a2e0: SQL compilation error:
inconsistent data type for result columns for set operator input branches, expected TIMESTAMP_LTZ(9), got TIMESTAMP_NTZ(9) for expression [{2}] branch {3}
@foundinblank foundinblank changed the title Archive doesn't work Archive throws a timestamp-related error on Snowflake Aug 17, 2018
@drewbanin
Copy link
Contributor

Thanks for the writeup @foundinblank. You indicated that changing null::timestamp to null::timestamp_ltz doesn't fix this error. There are two instances of null::timestamp in the SQL you sent over -- were you able to try changing both of them? I think that should work, so I'd be very curious to know if that still fails for you.

@foundinblank
Copy link
Author

Changing both to null::timestamp_ltz didn't work.

@foundinblank
Copy link
Author

Changed the source tables from to_timestamp_ltz to to_timestamp_ntz and dropped the archive schema both resolved the error.

@foundinblank
Copy link
Author

Updating..._ltz definitely doesn't work with this. After dropping the archive schema and trying _ltz, errors were thrown. Dropping the archive again, and using _ntz, it works.

@drewbanin drewbanin added this to the Wilt Chamberlain milestone Nov 28, 2018
@drewbanin drewbanin added the snapshots Issues related to dbt's snapshot functionality label Dec 5, 2018
@drewbanin drewbanin changed the title Archive throws a timestamp-related error on Snowflake Standardize timestamp column types in the Archival materialization Mar 23, 2019
@drewbanin
Copy link
Contributor

This is prioritized for the Wilt Chamberlain release. Databases like Redshift and Postgres only have a single type to represent timestamps, whereas Snowflake and BigQuery (and presumably other plugin-able databases) have more nuanced ways of representing time.

This particular issue in archival when dbt tries to union together "archived" data (with a known valid_to timestamp) with "current" data (with a null valid_to` timestamp).

Instead of supplying a timestamp type for the initially null valid_to field, dbt should trust the database to perform the union intelligently. It may be necessary to union the tables in an order which places the valid_to with an explicit type before the phony valid_to value.

Note: dbt should not try to coerce types if a source table updated_at field changes from one timestamp type to another. If this happens, we should let the query fail in the database.

Test cases:

  • Redshift (confirm that Redshift can infer the implicit type correctly)
  • Postgres (confirm that Postgres can infer the implicit type correctly)
  • Snowflake
    • Archive a table with a timestamp_ntz updated_at field
    • Archive a table with a timestamp_ltz updated_at field
  • BigQuery
    • Archive a table with a timestamp updated_at field
    • Archive a table with a datetime updated_at field
    • Archive a table with a date updated_at field

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
snapshots Issues related to dbt's snapshot functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants