Doc changes for MERGE #1082

pinodeca · 2023-04-25T00:26:03Z

Why are we implementing it? (sales eng)

Postgres community released a new feature, in 15.0, that can conditionally insert, update, or delete rows of a table. MERGE performs actions that modify rows in the target table, using the data source. MERGE provides a single SQL statement that can conditionally INSERT, UPDATE or DELETE rows, a task that would otherwise require multiple procedural language statements, using UPSERT with ON CONFLICT clause etc. Sample merge command looks like below, where all modifications are done on the target table.

MERGE INTO target t
USING source s ON (t.customer_id = s.customer_id)
WHEN MATCHED AND t.order_center = 'value' THEN
 -- Remove the customer in target
DELETE
WHEN MATCHED THEN
-- Existing customer, update the order count and last_order_id in target
UPDATE SET    order_count = t.order_count + 1, last_order_id = s.order_id
WHEN NOT MATCHED THEN
-- Add the customer in target
INSERT (customer_id, last_order_id, order_center, order_count, last_order)
VALUES (customer_id, s.order_id, s.order_center, 123, s.order_time);

As you can see, this is a complex SQL statement, so we decided to support this in Citus in a phased manner.

What are the typical use cases?

Communication goals (e.g. detailed howto vs orientation)

Good locations for content in docs structure

How does this work? (devs)

MERGE on distributed tables is compiled, planned by the engine, and is routed to the worker nodes (where the data is present). The merge-modifications are done individually at the individual shard level, all this is done internally (and hidden) by the Citus engine.

Example sql

CREATE TABLE source
(
   order_id        INT,
   customer_id     INT,
   order_center    VARCHAR,
   order_time timestamp
);

CREATE TABLE target
(
   customer_id     INT,
   last_order_id   INT,
   order_center    VARCHAR,
   order_count     INT,
   last_order      timestamp
);

SELECT create_distributed_table('target', 'customer_id');
SELECT create_distributed_table('source', 'customer_id', colocate_with=>'target');

MERGE INTO target t
   USING source s
   ON (t.customer_id = s.customer_id)

   WHEN MATCHED AND t.order_center = 'XX' THEN
       DELETE

   WHEN MATCHED THEN
       UPDATE SET     -- Existing customer, update the order count and last_order_id
           order_count = t.order_count + 1,
           last_order_id = s.order_id

   WHEN NOT MATCHED THEN       -- New entry, record it.
       INSERT (customer_id, last_order_id, order_center, order_count, last_order)
           VALUES (customer_id, s.order_id, s.order_center, 123, s.order_time);

Corner cases, gotchas

Target	Source	Support	Comments
Local	Local	Yes
Local	Distributed	No	TBD
Local	Reference	No
Distributed	Local	No	TBD.
Distributed	Distributed	Yes	Only co-located tables.
Distributed	Reference	In progress
Reference	NA	No	Reference table as target is not allowed.

Trying a MERGE SQL on Citus reference table will result in
ERROR: MERGE command is not supported on reference tables yet

Trying a MERGE SQL on distributed tables that are not colocated in will result in
ERROR: For MERGE command, all the distributed tables must be colocated

Trying a MERGE SQL on distributed tables that are not joined on their respective distribution columns will result in
ERROR: MERGE command is only supported when all distributed tables are co-located and joined on their distribution columns

Updating the distribution column in the MERGE actions is not allowed and will result in
ERROR: updating the distribution column is not allowed in MERGE actions

Are there relevant blog posts or outside documentation about the concept/feature?

For more details, please look here: PostgreSQL: Documentation: 15: MERGE.

Link to relevant commits and regression tests if applicable

Phase-I: Support MERGE with local tables by tejeswarm · Pull Request #6365 · citusdata/citus (github.com)
Phase-II: Support MERGE on distributed tables with restrictions by tejeswarm · Pull Request #6610 · citusdata/citus (github.com)
Phase-III: This implements MERGE phase-III by tejeswarm · Pull Request #6696 · citusdata/citus (github.com)

The text was updated successfully, but these errors were encountered:

pinodeca · 2023-04-25T00:26:37Z

@tejeswarm @onderkalaci Can you please fill in each of the template sections in the description above?

jonels-msft mentioned this issue Apr 26, 2023

MERGE command support #1086

Merged

jonels-msft closed this as completed in #1086 Apr 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doc changes for MERGE #1082

Doc changes for MERGE #1082

pinodeca commented Apr 25, 2023 •

edited by tejeswarm

Loading

pinodeca commented Apr 25, 2023

Doc changes for MERGE #1082

Doc changes for MERGE #1082

Comments

pinodeca commented Apr 25, 2023 • edited by tejeswarm Loading

Why are we implementing it? (sales eng)

What are the typical use cases?

Communication goals (e.g. detailed howto vs orientation)

Good locations for content in docs structure

How does this work? (devs)

Example sql

Corner cases, gotchas

Are there relevant blog posts or outside documentation about the concept/feature?

Link to relevant commits and regression tests if applicable

pinodeca commented Apr 25, 2023

pinodeca commented Apr 25, 2023 •

edited by tejeswarm

Loading