Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that snapshot valid_from and valid_to timestamps are non-overlapping #1736

Closed
drewbanin opened this issue Sep 10, 2019 · 0 comments · Fixed by #1994
Closed

Ensure that snapshot valid_from and valid_to timestamps are non-overlapping #1736

drewbanin opened this issue Sep 10, 2019 · 0 comments · Fixed by #1994
Labels
bug Something isn't working snapshots Issues related to dbt's snapshot functionality
Milestone

Comments

@drewbanin
Copy link
Contributor

From Josh P and Claus H in Slack:

we were using the dbt_valid_from and dbt_valid_to ranges to determine which row represents the correct values at a point in time, but now it means 2 rows would be selected for finding a point-in-time between the 3 seconds overlap of the example i posted

dbt should instead ensure that snapshot change records for the check strategy are non-overlapping.

Sample code:

{% macro snapshot_check_fixed_strategy(node, snapshotted_rel, current_rel, config, target_exists) %}
    {% set check_cols_config = config['check_cols'] %}
    {% set primary_key = config['unique_key'] %}

    {# -------------- NEW ------------------ #}
    {% set select_current_time -%}
        select {{ snapshot_get_time() }} as snapshot_start
    {%- endset %}

    {% set now = run_query(select_current_time)[0].snapshot_start %}
    {% set updated_at = "'" ~ now ~ "'::timestamp" %}
    {# -------------- NEW ------------------ #}

    {% if check_cols_config == 'all' %}
        {% set check_cols = get_columns_in_query(node['injected_sql']) %}
    {% elif check_cols_config is iterable and (check_cols_config | length) > 0 %}
        {% set check_cols = check_cols_config %}
    {% else %}
        {% do exceptions.raise_compiler_error("Invalid value for 'check_cols': " ~ check_cols_config) %}
    {% endif %}

    {% set row_changed_expr -%}
        (
        {% for col in check_cols %}
            {{ snapshotted_rel }}.{{ col }} != {{ current_rel }}.{{ col }}
            or
            ({{ snapshotted_rel }}.{{ col }} is null) != ({{ current_rel }}.{{ col }} is null)
            {%- if not loop.last %} or {% endif %}

        {% endfor %}
        )
    {%- endset %}

    {% set scd_id_expr = snapshot_hash_arguments([primary_key, updated_at]) %}

    {% do return({
        "unique_key": primary_key,
        "updated_at": updated_at,
        "row_changed": row_changed_expr,
        "scd_id": scd_id_expr
    }) %}

{% endmacro %}

The sample code shown above will need to be updated to:

  1. use database-specific quotes (not ' everywhere)
  2. use a database-specific timestamp type
@drewbanin drewbanin added the bug Something isn't working label Sep 10, 2019
@drewbanin drewbanin added this to the Louisa May Alcott milestone Sep 10, 2019
@drewbanin drewbanin added the snapshots Issues related to dbt's snapshot functionality label Sep 10, 2019
@drewbanin drewbanin modified the milestones: Louisa May Alcott, 0.15.1 Nov 4, 2019
beckjake added a commit that referenced this issue Dec 13, 2019
…d-timestamps

Avoid overlapping validity timestamps (#1736)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working snapshots Issues related to dbt's snapshot functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant