Skip to content

MERGE INTO deduplication discussion with Wilson#85

Draft
kbendick wants to merge 1 commit intomasterfrom
kb-test-merge-into-for-wilson
Draft

MERGE INTO deduplication discussion with Wilson#85
kbendick wants to merge 1 commit intomasterfrom
kb-test-merge-into-for-wilson

Conversation

@kbendick
Copy link
Owner

This is a demo to discuss a deduplication understanding question with Wilson no Canada on Slack.

This is the expected behavior for a MERGE INTO command, according to the postgres documentation: https://www.postgresql.org/message-id/attachment/23520/sql-merge.html

The MERGE INTO will only deduplicate rows in target that are already there.

From the postgres documentation:

First, the MERGE command performs a left outer join from source query to target table, producing zero or more merged rows. For each merged row, WHEN clauses are evaluated in the specified order until one of them is activated. The corresponding action is then applied and processing continues for the next row.

MERGE actions have the same effect as regular UPDATE, INSERT, or DELETE commands of the same names, though the syntax is slightly different.

But MERGE INTO is not designed to work for deduplicating records in source that come "after" records before it. It can only be used for deduplication / updating records that are already in target.

@github-actions github-actions bot added the SPARK label Jun 20, 2022
@kbendick
Copy link
Owner Author

The notes from postgres specify it best. Of course we don't have any before statements etc.

Notes (copied from https://www.postgresql.org/message-id/attachment/23520/sql-merge.html)

What essentially happens is that the target table is left outer-joined to the tables mentioned in the source-query, and each output row of the join may then activate at most one when-clause. The row will be matched only once per statement, so the status of MATCHED or NOT MATCHED cannot change once testing of WHEN clauses has begun. MERGE will not invoke Rules.

The following steps take place during the execution of MERGE.

  1. Perform any BEFORE STATEMENT triggers for actions specified, whether or not they actually occur.
  2. Perform left outer join from source to target table. Then for each row:
    a. Evaluate whether each row is MATCHED or NOT MATCHED.
    b. Test each WHEN condition in the order specified until one activates. Identify the action and its event type.
    c. Perform any BEFORE ROW triggers that fire for the action's event type.
    d. Apply the action specified.
    e. Perform any AFTER ROW triggers that fire for the action's event type.
  3. Perform any AFTER STATEMENT triggers for actions specified, whether or not they actually occur.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant