Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support MERGE with local tables #6365

Merged
merged 1 commit into from
Dec 19, 2022
Merged

Support MERGE with local tables #6365

merged 1 commit into from
Dec 19, 2022

Conversation

tejeswarm
Copy link
Contributor

@tejeswarm tejeswarm commented Sep 23, 2022

All the tables (target, source or any CTE present) in the SQL statement are local i.e., a merge-sql with a combination of Citus local and Non-Citus tables (regular Postgres tables) should work and give the same result as Postgres MERGE on regular tables. Catch and throw an exception (not-yet-supported) for all other scenarios during Citus-planning phase.

Copy link
Member

@onderkalaci onderkalaci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial observations:

  1. INSERTs could happen on a random shard, we cannot seem to pushdown INSERT as is. Can you see a way to restrict the INSERT to the shards that are used in the MERGE command? Like the things we do for triggers?
CREATE TABLE target (tid integer, balance integer)
  WITH (autovacuum_enabled=off);
SELECT create_distributed_table('target', 'tid');

CREATE TABLE source (sid integer, delta integer) -- no index
  WITH (autovacuum_enabled=off);
SELECT create_distributed_table('source', 'sid');


-- insert just 1 row to sourc so that we can test UNMATCHED on target
INSERT INTO source VALUES (1, 10);

-- so, now the INSERT with tid:2000 will happen on a random shard
MERGE INTO target t 
USING SOURCE AS s ON t.tid = s.sid 
WHEN NOT MATCHED THEN 
	INSERT VALUES (2000,2000);

SELECT * FROM target ;
┌──────┬─────────┐
│ tid  │ balance │
├──────┼─────────┤
│ 20002000 │
└──────┴─────────┘
(1 row)


-- but, we cannot find tid = 2000, because we are searching for a wrong shard
SELECT * FROM target WHERE tid = 2000;
┌─────┬─────────┐
│ tid │ balance │
├─────┼─────────┤
└─────┴─────────┘
(0 rows)

Time: 1.700 ms
  1. Less important, can be disallowed, we cannot allow UPDATE dist_keys.
TRUNCATE source, target;

-- insert some rows that do match
INSERT INTO source VALUES (1,1);
INSERT INTO target VALUES (1,1);

-- OOPS, we changed the dist key, we should disallow this
MERGE INTO target t
USING source AS s
ON t.tid = s.sid
WHEN MATCHED THEN
	UPDATE SET tid = 0;

SELECT * FROM target;
┌─────┬─────────┐
│ tid │ balance │
├─────┼─────────┤
│   01 │
└─────┴─────────┘
(1 row)


-- we cannot fiind
SELECT * FROM target WHERE tid = 0;
┌─────┬─────────┐
│ tid │ balance │
├─────┼─────────┤
└─────┴─────────┘
(0 rows)

There are probably more cases like (2) where we need to be more careful, like we cannot allow UPDATE .. SET nextval('sequence') as the sequences should be evaluated on the coordinator. But, as we resolve bigger issues as (1), we can gradually fine tune things like (2), if that makes sense to you as well.

src/backend/distributed/planner/distributed_planner.c Outdated Show resolved Hide resolved
src/backend/distributed/planner/distributed_planner.c Outdated Show resolved Hide resolved
@tejeswarm tejeswarm force-pushed the merge branch 3 times, most recently from c7f9d34 to 62e542d Compare November 18, 2022 21:25
@tejeswarm tejeswarm force-pushed the merge branch 7 times, most recently from a9dd313 to 0a4fbbc Compare November 21, 2022 05:12
@tejeswarm tejeswarm changed the title Support MERGE with two scenarios Support MERGE with local tables Nov 30, 2022
@tejeswarm
Copy link
Contributor Author

Some initial observations:

  1. INSERTs could happen on a random shard, we cannot seem to pushdown INSERT as is. Can you see a way to restrict the INSERT to the shards that are used in the MERGE command? Like the things we do for triggers?
CREATE TABLE target (tid integer, balance integer)
  WITH (autovacuum_enabled=off);
SELECT create_distributed_table('target', 'tid');

CREATE TABLE source (sid integer, delta integer) -- no index
  WITH (autovacuum_enabled=off);
SELECT create_distributed_table('source', 'sid');


-- insert just 1 row to sourc so that we can test UNMATCHED on target
INSERT INTO source VALUES (1, 10);

-- so, now the INSERT with tid:2000 will happen on a random shard
MERGE INTO target t 
USING SOURCE AS s ON t.tid = s.sid 
WHEN NOT MATCHED THEN 
	INSERT VALUES (2000,2000);

SELECT * FROM target ;
┌──────┬─────────┐
│ tid  │ balance │
├──────┼─────────┤
│ 20002000 │
└──────┴─────────┘
(1 row)


-- but, we cannot find tid = 2000, because we are searching for a wrong shard
SELECT * FROM target WHERE tid = 2000;
┌─────┬─────────┐
│ tid │ balance │
├─────┼─────────┤
└─────┴─────────┘
(0 rows)

Time: 1.700 ms
  1. Less important, can be disallowed, we cannot allow UPDATE dist_keys.
TRUNCATE source, target;

-- insert some rows that do match
INSERT INTO source VALUES (1,1);
INSERT INTO target VALUES (1,1);

-- OOPS, we changed the dist key, we should disallow this
MERGE INTO target t
USING source AS s
ON t.tid = s.sid
WHEN MATCHED THEN
	UPDATE SET tid = 0;

SELECT * FROM target;
┌─────┬─────────┐
│ tid │ balance │
├─────┼─────────┤
│   01 │
└─────┴─────────┘
(1 row)


-- we cannot fiind
SELECT * FROM target WHERE tid = 0;
┌─────┬─────────┐
│ tid │ balance │
├─────┼─────────┤
└─────┴─────────┘
(0 rows)

There are probably more cases like (2) where we need to be more careful, like we cannot allow UPDATE .. SET nextval('sequence') as the sequences should be evaluated on the coordinator. But, as we resolve bigger issues as (1), we can gradually fine tune things like (2), if that makes sense to you as well.

Very much valid and good scenarios, but with the new strategy of Phase I supporting local tables, this is not an issue. but I will track them and address in Phase II

@tejeswarm tejeswarm force-pushed the merge branch 7 times, most recently from 33d4b52 to 12bcff4 Compare December 13, 2022 01:56
@tejeswarm tejeswarm force-pushed the merge branch 2 times, most recently from 9254381 to bb725e0 Compare December 16, 2022 21:07
@tejeswarm tejeswarm force-pushed the merge branch 8 times, most recently from 5403474 to 963c1dd Compare December 19, 2022 03:49
All the tables (target, source or any CTE present) in the SQL statement are local i.e. a merge-sql with a combination of Citus local and
Non-Citus tables (regular Postgres tables) should work and give the same result as Postgres MERGE on regular tables. Catch and throw an
exception (not-yet-supported) for all other scenarios during Citus-planning phase.
@codecov
Copy link

codecov bot commented Dec 19, 2022

Codecov Report

Merging #6365 (58213b3) into main (5268d0a) will decrease coverage by 0.02%.
The diff coverage is 86.15%.

@@            Coverage Diff             @@
##             main    #6365      +/-   ##
==========================================
- Coverage   92.92%   92.89%   -0.03%     
==========================================
  Files         259      259              
  Lines       55528    55570      +42     
==========================================
+ Hits        51597    51623      +26     
- Misses       3931     3947      +16     

@tejeswarm tejeswarm merged commit 9a9989f into main Dec 19, 2022
@tejeswarm tejeswarm deleted the merge branch December 19, 2022 04:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants