Check shard replication mode before moving shards #6727

agedemenli · 2023-02-22T15:00:51Z

Shard rebalancer throws the replication mode error after moving some of the shards. This is because one colocation group (or table) is able to be moved via logical replication while the other one is not. We should probably check and throw the error earlier if it's going to be thrown anyway. This can be done by adding a check after the move list is created, but before the actual moves are started.

Steps to reproduce:

set citus.shard_count = 4; -- just to get the error sooner
select citus_remove_node('localhost',9702);

create table t1 (a int primary key);
select create_distributed_table('t1','a');
create table t2 (a bigint);
select create_distributed_table('t2','a');

select citus_add_node('localhost',9702);
select rebalance_table_shards();
NOTICE:  Moving shard 102008 from localhost:9701 to localhost:9702 ...
NOTICE:  Moving shard 102009 from localhost:9701 to localhost:9702 ...
NOTICE:  Moving shard 102012 from localhost:9701 to localhost:9702 ...
ERROR:  cannot use logical replication to transfer shards of the relation t2 since it doesn't have a REPLICA IDENTITY or PRIMARY KEY

DESCRIPTION: Check before logicalrep for rebalancer, error if needed Check if we can use logical replication or not, in case of shard transfer mode = auto, before executing the shard moves. If we can't, error out. Before this PR, we used to error out in the middle of shard moves: ```sql set citus.shard_count = 4; -- just to get the error sooner select citus_remove_node('localhost',9702); create table t1 (a int primary key); select create_distributed_table('t1','a'); create table t2 (a bigint); select create_distributed_table('t2','a'); select citus_add_node('localhost',9702); select rebalance_table_shards(); NOTICE: Moving shard 102008 from localhost:9701 to localhost:9702 ... NOTICE: Moving shard 102009 from localhost:9701 to localhost:9702 ... NOTICE: Moving shard 102012 from localhost:9701 to localhost:9702 ... ERROR: cannot use logical replication to transfer shards of the relation t2 since it doesn't have a REPLICA IDENTITY or PRIMARY KEY ``` Now we check and error out in the beginning, without moving the shards. fixes: #6727

agedemenli added the rebalancer label Feb 22, 2023

agedemenli mentioned this issue Mar 9, 2023

Check before logicalrep for rebalancer, error if needed #6754

Merged

agedemenli closed this as completed in #6754 Mar 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check shard replication mode before moving shards #6727

Check shard replication mode before moving shards #6727

agedemenli commented Feb 22, 2023

Check shard replication mode before moving shards #6727

Check shard replication mode before moving shards #6727

Comments

agedemenli commented Feb 22, 2023