Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1669] [Bug] type_timestamp() returns incorrect value for Postgres and Snowflake #6453

Closed
2 tasks done
clausherther opened this issue Dec 17, 2022 · 6 comments
Closed
2 tasks done
Labels
bug Something isn't working stale Issues that have gone stale Team:Adapters Issues designated for the adapter area of the code

Comments

@clausherther
Copy link
Contributor

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

  • Create a column in Postgres or Snowflake with type type_timestamp(),
select 
    cast(date_col as {{ type_timestamp() }}) as date_timestamp,
...

This renders to

cast(date_col as TIMESTAMP) as date_timestamp,
  • Then during testing, retrieve column types from
{%- set columns_in_relation = adapter.get_columns_in_relation(model) -%}

However,

{% for column in columns_in_relation %}
   {{ log((column.dtype | upper), info=True) }}
{% endfor %}

for this column returns:

TIMESTAMP WITHOUT TIME ZONE

This makes testing data types using the built in types impossible.

For context, in dbt-expectations, we've had to override type_timestamp() for that reason and would like to deprecate that in favor of the dbt-core implementation.

Expected Behavior

type_timestamp() should return the native data type, e.g. TIMESTAMP WITHOUT TIME ZONE

In dbt-expectations, we use this right now:

{# timestamp  -------------------------------------------------     #}
{%- macro type_timestamp() -%}
  {{ return(adapter.dispatch('type_timestamp', 'dbt_expectations')()) }}
{%- endmacro -%}

{% macro default__type_timestamp() -%}
    timestamp
{%- endmacro %}

{% macro snowflake__type_timestamp() -%}
    timestamp_ntz
{%- endmacro %}

{% macro postgres__type_timestamp() -%}
    timestamp without time zone
{%- endmacro %}

Steps To Reproduce

See steps above. Tested on Postgres, BigQuery and Snowflake.

Relevant log output

No response

Environment

- OS: MacOS 13
- Python: 3.9.9
- dbt: 1.3.1

Which database adapter are you using with dbt?

postgres, snowflake, bigquery

Additional Context

No response

@clausherther clausherther added bug Something isn't working triage labels Dec 17, 2022
@github-actions github-actions bot changed the title [Bug] type_timestamp() returns incorrect value for Postgres and Snowflake [CT-1669] [Bug] type_timestamp() returns incorrect value for Postgres and Snowflake Dec 17, 2022
@clausherther
Copy link
Contributor Author

I realize the above matches the docs, however I'm not sure why the described behavior is useful or desired.

@dbeatty10
Copy link
Contributor

dbeatty10 commented Dec 24, 2022

@clausherther Thanks for reaching out about this!

To make sure I'm following with you, does the following capture the difference between what you're actually seeing versus your expectations?

Adapter Macro Actual Expected
postgres type_timestamp() TIMESTAMP timestamp without time zone
snowflake type_timestamp() TIMESTAMP timestamp_ntz
bigquery type_timestamp() TIMESTAMP timestamp

Assuming so, here's some more info:

Migration of cross-database type_* macros from dbt-utils to dbt-core

I think that the last implementation of type_timestamp() in dbt_utils actually aligned with your expectations. It looks nearly identical to what you have in dbt-expectations here.

We tried to do some consolidation of data type logic in this PR (by taking advantage of api.Column.translate_type), and it changed the literal strings being rendered in some cases:

Here's a chain of references explaining why type_timestamp() consistently yields TIMESTAMP for all the adapters you mentioned:

Next steps

I'm actually a bit surprised we haven't heard problems about this! I wonder if it's because TIMESTAMP somehow still ended up with the same final data types in each database?

We've got a few options:

  1. Do nothing and accept things as-is
  2. Change the implementation of type_timestamp() via an update to TYPE_LABELS
  3. Change the implementation of type_timestamp() directly by moving away from api.Column.translate_type
  4. Something else not listed above.

I don't know what the consequences would be if we update the TYPE_LABELS in the relevant databases -- it might break something. Moving away from api.Column.translate_type for type_timestamp() might be fully backwards compatible 🤷 ?

As next steps, we should try out each of the options above to determine the consequences of each.

Heads-up: we all be out of the office until Jan 3rd, so my responses will be delayed until after that.

@dbeatty10 dbeatty10 added awaiting_response Refinement Maintainer input needed and removed triage labels Dec 24, 2022
@clausherther
Copy link
Contributor Author

@dbeatty10 thanks for the detailed reply! Let's chat more after the New Year.
I think the reason you probably haven't heard much about this is that when you use type_timestamp() to cast a column, TIMESTAMP works just fine across those databases and results in the correct type being created. You only run into issues if you compare what type_timestamp returns and what the db stores as the more explicit version of that. I think the most common way people that today is by using the dbt-expectations tests and in dbt-expectations we override that type_timestamp macro today so we might have been hiding that issue ;)

@dbeatty10 dbeatty10 self-assigned this Jan 3, 2023
@jtcohen6 jtcohen6 added the Team:Adapters Issues designated for the adapter area of the code label Feb 7, 2023
@dbeatty10
Copy link
Contributor

@clausherther I gave this some more thought, and I think we should implement Option 2 listed above:

  • Change the implementation of type_timestamp() via an update to TYPE_LABELS

Let me know what you think!

Pros

Cons

  • If anyone has been relying on the literal string value of type_timestamp() macro to always be TIMESTAMP, then it will be a breaking change for them.
    • But this would be an odd thing to do -- I'd expect most/all people to want to use it in the manner you described instead.

Implementation hints

The proposed implementation would involve coordinated PRs in 5 different repos: dbt-core, dbt-bigquery, dbt-redshift, dbt-snowflake, and dbt-spark.

I were to implement this, here's how I might go about it:

  1. Copy TYPE_LABELS from here into:
  2. Update the 2nd part of "TIMESTAMP": "TIMESTAMP" within each adapter to be the desired data type. e.g., something like:
    • PostgresColumn - "TIMESTAMP": "TIMESTAMP WITHOUT TIME ZONE"
    • RedshiftColumn - "TIMESTAMP": "TIMESTAMP"
    • SnowflakeColumn - "TIMESTAMP": "TIMESTAMP_NTZ"
    • SparkColumn - "TIMESTAMP": "TIMESTAMP"
  3. Include functional tests to establish the expected behavior going forward
    • See the schema test from here and here for inspiration

dbt-bigquery is already has TIMESTAMP here, so I don't think its TYPE_LABELS needs any changes. But any new functional test should be added within that adapter too in order to verify!

@dbeatty10 dbeatty10 removed triage Refinement Maintainer input needed labels Feb 26, 2023
@dbeatty10 dbeatty10 removed their assignment Feb 26, 2023
@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Aug 26, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Sep 2, 2023

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale Issues that have gone stale Team:Adapters Issues designated for the adapter area of the code
Projects
None yet
Development

No branches or pull requests

3 participants