Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc changes for MERGE #1082

Closed
pinodeca opened this issue Apr 25, 2023 · 1 comment · Fixed by #1086
Closed

Doc changes for MERGE #1082

pinodeca opened this issue Apr 25, 2023 · 1 comment · Fixed by #1086

Comments

@pinodeca
Copy link

pinodeca commented Apr 25, 2023

Why are we implementing it? (sales eng)

Postgres community released a new feature, in 15.0, that can conditionally insert, update, or delete rows of a table. MERGE performs actions that modify rows in the target table, using the data source. MERGE provides a single SQL statement that can conditionally INSERT, UPDATE or DELETE rows, a task that would otherwise require multiple procedural language statements, using UPSERT with ON CONFLICT clause etc. Sample merge command looks like below, where all modifications are done on the target table.

MERGE INTO target t
USING source s ON (t.customer_id = s.customer_id)
WHEN MATCHED AND t.order_center = 'value' THEN
 -- Remove the customer in target
DELETE
WHEN MATCHED THEN
-- Existing customer, update the order count and last_order_id in target
UPDATE SET    order_count = t.order_count + 1, last_order_id = s.order_id
WHEN NOT MATCHED THEN
-- Add the customer in target
INSERT (customer_id, last_order_id, order_center, order_count, last_order)
VALUES (customer_id, s.order_id, s.order_center, 123, s.order_time);

As you can see, this is a complex SQL statement, so we decided to support this in Citus in a phased manner.

What are the typical use cases?

Communication goals (e.g. detailed howto vs orientation)

Good locations for content in docs structure

How does this work? (devs)

MERGE on distributed tables is compiled, planned by the engine, and is routed to the worker nodes (where the data is present). The merge-modifications are done individually at the individual shard level, all this is done internally (and hidden) by the Citus engine.

Example sql

CREATE TABLE source
(
   order_id        INT,
   customer_id     INT,
   order_center    VARCHAR,
   order_time timestamp
);

CREATE TABLE target
(
   customer_id     INT,
   last_order_id   INT,
   order_center    VARCHAR,
   order_count     INT,
   last_order      timestamp
);
SELECT create_distributed_table('target', 'customer_id');
SELECT create_distributed_table('source', 'customer_id', colocate_with=>'target');
MERGE INTO target t
   USING source s
   ON (t.customer_id = s.customer_id)

   WHEN MATCHED AND t.order_center = 'XX' THEN
       DELETE

   WHEN MATCHED THEN
       UPDATE SET     -- Existing customer, update the order count and last_order_id
           order_count = t.order_count + 1,
           last_order_id = s.order_id

   WHEN NOT MATCHED THEN       -- New entry, record it.
       INSERT (customer_id, last_order_id, order_center, order_count, last_order)
           VALUES (customer_id, s.order_id, s.order_center, 123, s.order_time);

Corner cases, gotchas

Target Source Support Comments
Local Local Yes  
Local Distributed No TBD
Local Reference No  
Distributed Local No TBD.
Distributed Distributed Yes Only co-located tables.
Distributed Reference In progress  
Reference NA No Reference table as target is not allowed.

Trying a MERGE SQL on Citus reference table will result in
ERROR: MERGE command is not supported on reference tables yet

Trying a MERGE SQL on distributed tables that are not colocated in will result in
ERROR: For MERGE command, all the distributed tables must be colocated

Trying a MERGE SQL on distributed tables that are not joined on their respective distribution columns will result in
ERROR: MERGE command is only supported when all distributed tables are co-located and joined on their distribution columns

Updating the distribution column in the MERGE actions is not allowed and will result in
ERROR: updating the distribution column is not allowed in MERGE actions

Are there relevant blog posts or outside documentation about the concept/feature?

For more details, please look here: PostgreSQL: Documentation: 15: MERGE.

Link to relevant commits and regression tests if applicable

Phase-I: Support MERGE with local tables by tejeswarm · Pull Request #6365 · citusdata/citus (github.com)
Phase-II: Support MERGE on distributed tables with restrictions by tejeswarm · Pull Request #6610 · citusdata/citus (github.com)
Phase-III: This implements MERGE phase-III by tejeswarm · Pull Request #6696 · citusdata/citus (github.com)

@pinodeca
Copy link
Author

@tejeswarm @onderkalaci Can you please fill in each of the template sections in the description above?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant