-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-602] [Feature] Support for more complex MERGE update rules #5215
Comments
@cwelton Thanks for opening the issue! I do think the right answer here is option 2, creating your own dbt holds a pretty strong belief that DDL + DML is boilerplate. It should be as "naive" as possible, and it should not be the place where business logic is defined and executed. It's next-to-impossible to "preview" your query results before actually updating the existing table. This is why I've opted for the join-based approach you mention, or a longer lookback window: I can actually run the SQL that returns the results, as a read-only query, and verify that they're correct before they are upserted/merged into the table. Even if that lookback/join is more expensive than performing the calculation within the The separation between transformation/business logic and materialization logic is, IMO, one of As a broader matter, we are not going to be able to support every possible configuration for every incremental processing situation with our built-in macros and configurations. That's okay, IMO. We should aim to:
In the first category, we do support The good news on the second bullet: this is work we've scoped to pick up very soon, and which we hope to ship in the next minor release of
This should make it possible to get what you're after. I'm sure you've already written code in this vein: models:
- name: my_complex_incremental_model.sql
config:
materialized: incremental
incremental_strategy: merge
custom_merge_update_behavior:
- column_name: balance
# "DEST" is "OLD", "SOURCE" is "NEW"
expression: "DBT_INTERNAL_DEST.balance + DBT_INTERNAL_SOURCE.balance_delta"
) }} {% set custom_merge_update_behavior = config.get('custom_merge_update_behavior') %}
{% if unique_key %}
when matched then update set
{% for column_name in update_columns -%}
{% if column name in ... %} {# pseudo code #}
{{ column_name }} = {{ custom_merge_update_behavior.expression }}
{% else %} {# standard behavior #}
{{ column_name }} = DBT_INTERNAL_SOURCE.{{ column_name }}
{%- if not loop.last %}, {%- endif %}
{%- endfor %}
{% endif %} Knowing that the macro has a solidified signature, and predictable inputs and outputs, should give you that much more confidence about the custom reimplementation over the long run. Aside: I don't think Postgres actually supports the The doc you linked is the one I find when I google for it, but it's for Related: |
@jtcohen6: at least not yet :) MERGE is expected to be part of the upcoming PostgreSQL 15 release (via PostgreSQL 15 Beta 1 Released!). |
@jtcohen6 hi, i'm interested in more functionality in the merge statement, particularly around support for The current behavior is If we want to keep the existing row ( The last two of Let me know what you think! |
Is this your first time opening an issue?
Describe the Feature
Current incremental model MERGE semantics are limited.
Using
merge_update_columns
allow for statements of this form:But do not allow for statements of this form:
The later can be important for many types of MERGE statements.
Describe alternatives you've considered
Option 1: Add an extra join to the definition of the incremental model.
This works but is substantially less efficient that producing an appropriate MERGE statement.
Option 2: Create my own
get_merge_sql
macroThis also works, and is efficient, but the functionally is broadly useful and would be valuable to have native support in dbt.
Who will this benefit?
Incremental model builders who have needs for more complicated MERGE behavior.
Examples would be maintaining an account balance table based on double entry book ledger.
Are you interested in contributing this feature?
I have some working code, but not sure if there are better ways of doing it.
Anything else?
Example syntax of MERGE:
postgres: https://www.postgresql.org/message-id/attachment/23520/sql-merge.html
bigquery: https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#merge_statement
snowflake: https://docs.snowflake.com/en/sql-reference/sql/merge.html
All support
column_name = expression
, where expression can support arbitrary expressions between both the source query and target table.The text was updated successfully, but these errors were encountered: