You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Postgres community released a new feature, in 15.0, that can conditionally insert, update, or delete rows of a table. MERGE performs actions that modify rows in the target table, using the data source. MERGE provides a single SQL statement that can conditionally INSERT, UPDATE or DELETE rows, a task that would otherwise require multiple procedural language statements, using UPSERT with ON CONFLICT clause etc. Sample merge command looks like below, where all modifications are done on the target table.
MERGE INTO target t
USING source s ON (t.customer_id = s.customer_id)
WHEN MATCHED AND t.order_center = 'value' THEN
-- Remove the customer in target
DELETE
WHEN MATCHED THEN
-- Existing customer, update the order count and last_order_id in target
UPDATE SET order_count = t.order_count + 1, last_order_id = s.order_id
WHEN NOT MATCHED THEN
-- Add the customer in target
INSERT (customer_id, last_order_id, order_center, order_count, last_order)
VALUES (customer_id, s.order_id, s.order_center, 123, s.order_time);
As you can see, this is a complex SQL statement, so we decided to support this in Citus in a phased manner.
What are the typical use cases?
Communication goals (e.g. detailed howto vs orientation)
Good locations for content in docs structure
How does this work? (devs)
MERGE on distributed tables is compiled, planned by the engine, and is routed to the worker nodes (where the data is present). The merge-modifications are done individually at the individual shard level, all this is done internally (and hidden) by the Citus engine.
MERGE INTO target t
USING source s
ON (t.customer_id = s.customer_id)
WHEN MATCHED AND t.order_center = 'XX' THEN
DELETE
WHEN MATCHED THEN
UPDATE SET -- Existing customer, update the order count and last_order_id
order_count = t.order_count + 1,
last_order_id = s.order_id
WHEN NOT MATCHED THEN -- New entry, record it.
INSERT (customer_id, last_order_id, order_center, order_count, last_order)
VALUES (customer_id, s.order_id, s.order_center, 123, s.order_time);
Corner cases, gotchas
Target
Source
Support
Comments
Local
Local
Yes
Local
Distributed
No
TBD
Local
Reference
No
Distributed
Local
No
TBD.
Distributed
Distributed
Yes
Only co-located tables.
Distributed
Reference
In progress
Reference
NA
No
Reference table as target is not allowed.
Trying a MERGE SQL on Citus reference table will result in ERROR: MERGE command is not supported on reference tables yet
Trying a MERGE SQL on distributed tables that are not colocated in will result in ERROR: For MERGE command, all the distributed tables must be colocated
Trying a MERGE SQL on distributed tables that are not joined on their respective distribution columns will result in ERROR: MERGE command is only supported when all distributed tables are co-located and joined on their distribution columns
Updating the distribution column in the MERGE actions is not allowed and will result in ERROR: updating the distribution column is not allowed in MERGE actions
Are there relevant blog posts or outside documentation about the concept/feature?
Why are we implementing it? (sales eng)
Postgres community released a new feature, in 15.0, that can conditionally insert, update, or delete rows of a table. MERGE performs actions that modify rows in the target table, using the data source. MERGE provides a single SQL statement that can conditionally INSERT, UPDATE or DELETE rows, a task that would otherwise require multiple procedural language statements, using
UPSERT
withON CONFLICT
clause etc. Sample merge command looks like below, where all modifications are done on the target table.As you can see, this is a complex SQL statement, so we decided to support this in Citus in a phased manner.
What are the typical use cases?
Communication goals (e.g. detailed howto vs orientation)
Good locations for content in docs structure
How does this work? (devs)
MERGE on distributed tables is compiled, planned by the engine, and is routed to the worker nodes (where the data is present). The merge-modifications are done individually at the individual shard level, all this is done internally (and hidden) by the Citus engine.
Example sql
Corner cases, gotchas
Trying a
MERGE SQL
on Citus reference table will result inERROR: MERGE command is not supported on reference tables yet
Trying a
MERGE SQL
on distributed tables that are not colocated in will result inERROR: For MERGE command, all the distributed tables must be colocated
Trying a
MERGE SQL
on distributed tables that are not joined on their respective distribution columns will result inERROR: MERGE command is only supported when all distributed tables are co-located and joined on their distribution columns
Updating the distribution column in the
MERGE actions
is not allowed and will result inERROR: updating the distribution column is not allowed in MERGE actions
Are there relevant blog posts or outside documentation about the concept/feature?
For more details, please look here: PostgreSQL: Documentation: 15: MERGE.
Link to relevant commits and regression tests if applicable
Phase-I: Support MERGE with local tables by tejeswarm · Pull Request #6365 · citusdata/citus (github.com)
Phase-II: Support MERGE on distributed tables with restrictions by tejeswarm · Pull Request #6610 · citusdata/citus (github.com)
Phase-III: This implements MERGE phase-III by tejeswarm · Pull Request #6696 · citusdata/citus (github.com)
The text was updated successfully, but these errors were encountered: