[CT-602] [Feature] Support for more complex MERGE update rules #5215

cwelton · 2022-05-04T16:58:08Z

Is this your first time opening an issue?

I have read the expectations for open source contributors

Describe the Feature

Current incremental model MERGE semantics are limited.

Using merge_update_columns allow for statements of this form:

WHEN MATCHED THEN UPDATE SET 
   balance = NEW.balance

But do not allow for statements of this form:

WHEN MATCHED THEN UPDATE SET 
   balance = OLD.balance + NEW.balance_delta

The later can be important for many types of MERGE statements.

Describe alternatives you've considered

Option 1: Add an extra join to the definition of the incremental model.

This works but is substantially less efficient that producing an appropriate MERGE statement.

Option 2: Create my own get_merge_sql macro

This also works, and is efficient, but the functionally is broadly useful and would be valuable to have native support in dbt.

Who will this benefit?

Incremental model builders who have needs for more complicated MERGE behavior.

Examples would be maintaining an account balance table based on double entry book ledger.

Are you interested in contributing this feature?

I have some working code, but not sure if there are better ways of doing it.

Anything else?

Example syntax of MERGE:

postgres: https://www.postgresql.org/message-id/attachment/23520/sql-merge.html
bigquery: https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#merge_statement
snowflake: https://docs.snowflake.com/en/sql-reference/sql/merge.html

All support column_name = expression, where expression can support arbitrary expressions between both the source query and target table.

The text was updated successfully, but these errors were encountered:

jtcohen6 · 2022-05-13T12:08:17Z

@cwelton Thanks for opening the issue!

I do think the right answer here is option 2, creating your own get_merge_sql macro. I'll say a bit more about this below.

dbt holds a pretty strong belief that DDL + DML is boilerplate. It should be as "naive" as possible, and it should not be the place where business logic is defined and executed. It's next-to-impossible to "preview" your query results before actually updating the existing table. This is why I've opted for the join-based approach you mention, or a longer lookback window: I can actually run the SQL that returns the results, as a read-only query, and verify that they're correct before they are upserted/merged into the table. Even if that lookback/join is more expensive than performing the calculation within the update statement, it's much less expensive to reason about, for the human reading the code.

The separation between transformation/business logic and materialization logic is, IMO, one of dbt-core's most powerful abstractions. It means that anyone who can write SQL that returns the right dataset, can ship that dataset into production, reliably and consistently. For that reason, I'm going to close the issue, as it's not the pattern I want to see us encouraging. But, as promised, I'll say more below :)

As a broader matter, we are not going to be able to support every possible configuration for every incremental processing situation with our built-in macros and configurations. That's okay, IMO. We should aim to:

support the most common and straightforward ones out of the box ("make the easy things easy")
enable users with advanced merging needs to have a clear override path

In the first category, we do support merge_update_columns as a config on BigQuery/Snowflake/Databricks—though even that pushes the limits of complexity a bit. Lots of people wanted it, it was fairly trivial to configure and to implement, and it solved a real need for many many folks.

The good news on the second bullet: this is work we've scoped to pick up very soon, and which we hope to ship in the next minor release of dbt-core (v1.2). As detailed in the "desirable outcomes" of #5089:

All "merge" macros (get_merge_sql, get_insert_delete_merge_sql, etc) share a function signature. They should support a common set of arguments, including predicates for custom extensions (see: #4546).

This should make it possible to get what you're after. I'm sure you've already written code in this vein:

models:
  - name: my_complex_incremental_model.sql
    config:
      materialized: incremental
      incremental_strategy: merge
      custom_merge_update_behavior:
        - column_name: balance
          # "DEST" is "OLD", "SOURCE" is "NEW"
          expression: "DBT_INTERNAL_DEST.balance + DBT_INTERNAL_SOURCE.balance_delta"
) }}

    {% set custom_merge_update_behavior = config.get('custom_merge_update_behavior') %}

    {% if unique_key %}
    when matched then update set
        {% for column_name in update_columns -%}
          {% if column name in ... %}  {# pseudo code #}
            {{ column_name }} = {{ custom_merge_update_behavior.expression }}
          {% else %}  {# standard behavior #}
            {{ column_name }} = DBT_INTERNAL_SOURCE.{{ column_name }}
            {%- if not loop.last %}, {%- endif %}
        {%- endfor %}
    {% endif %}

Knowing that the macro has a solidified signature, and predictable inputs and outputs, should give you that much more confidence about the custom reimplementation over the long run.

Aside: I don't think Postgres actually supports the merge statement!

The doc you linked is the one I find when I google for it, but it's for PostgreSQL 8.4devel, and I can't find any newer docs. Postgres 9.5 did add support for insert ... on conflict update, which would allow for the kind of custom merge-y behavior you're describing, where an old row is not directly replaced (set to new value, or deleted + re-inserted), but whereby the update itself recalculates the value from both the old + the new.

ferki · 2022-05-20T08:08:43Z

Aside: I don't think Postgres actually supports the merge statement!

@jtcohen6: at least not yet :) MERGE is expected to be part of the upcoming PostgreSQL 15 release (via PostgreSQL 15 Beta 1 Released!).

dongchris · 2023-03-29T22:45:24Z

@jtcohen6 hi, i'm interested in more functionality in the merge statement, particularly around support for
keep_oldest and keep_newest if an incoming row has the same merge keys as an existing row.

The current behavior is always_replace, where we keep the incoming row.

If we want to keep the existing row (never_replace), we can use merge_update_column=<merge_key> so this works as well.

The last two of keep_newest and keep_oldest is currently not supported, which would use a specified timestamp column to determine which row to keep.

Let me know what you think!

cwelton added enhancement New feature or request triage labels May 4, 2022

github-actions bot changed the title ~~[Feature] Support for more complex MERGE update rules~~ [CT-602] [Feature] Support for more complex MERGE update rules May 4, 2022

jtcohen6 self-assigned this May 9, 2022

jtcohen6 added the Team:Adapters Issues designated for the adapter area of the code label May 9, 2022

jtcohen6 added wontfix Not a bug or out of scope for dbt-core and removed triage labels May 13, 2022

jtcohen6 closed this as completed May 13, 2022

jtcohen6 removed their assignment May 20, 2022

jtcohen6 mentioned this issue May 20, 2022

[CT-656] [Feature] Allow for merge exclude columns #5260

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-602] [Feature] Support for more complex MERGE update rules #5215

[CT-602] [Feature] Support for more complex MERGE update rules #5215

cwelton commented May 4, 2022

jtcohen6 commented May 13, 2022

ferki commented May 20, 2022

dongchris commented Mar 29, 2023

[CT-602] [Feature] Support for more complex MERGE update rules #5215

[CT-602] [Feature] Support for more complex MERGE update rules #5215

Comments

cwelton commented May 4, 2022

Is this your first time opening an issue?

Describe the Feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

jtcohen6 commented May 13, 2022

ferki commented May 20, 2022

dongchris commented Mar 29, 2023