-
Notifications
You must be signed in to change notification settings - Fork 685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check before logicalrep for rebalancer, error if needed #6754
Conversation
Can we also squeeze in this https://github.com/citusdata/citus-enterprise/issues/394? I have seen customers hit that |
@onderkalaci I think that's different enough that it should have a separate fix. Main reason is that |
Yeah, I tried anyway as you are touching relevant places :p |
if (transferMode == TRANSFER_MODE_AUTOMATIC) | ||
{ | ||
/* | ||
* If the shard transfer mode is set to auto, we should check beforehand | ||
* if we are able to use logical replication to transfer shards or not. | ||
* We throw an error if any of the tables do not have a replica identity, which | ||
* is required for logical replication to replicate UPDATE and DELETE commands. | ||
*/ | ||
PlacementUpdateEvent *placementUpdate = NULL; | ||
foreach_ptr(placementUpdate, placementUpdateList) | ||
{ | ||
Oid relationId = RelationIdForShard(placementUpdate->shardId); | ||
List *colocatedTableList = ColocatedTableList(relationId); | ||
VerifyTablesHaveReplicaIdentity(colocatedTableList); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check should be done even before planning the moves and before calling EnsureReferenceTablesExistOnAllNodesExtended
. We do this already for the background rebalancer:
citus/src/backend/distributed/operations/shard_rebalancer.c
Lines 1905 to 1917 in e3cf7ac
const char shardTransferMode = LookupShardTransferMode(shardReplicationModeOid); | |
List *colocatedTableList = NIL; | |
Oid relationId = InvalidOid; | |
foreach_oid(relationId, options->relationIdList) | |
{ | |
colocatedTableList = list_concat(colocatedTableList, | |
ColocatedTableList(relationId)); | |
} | |
Oid colocatedTableId = InvalidOid; | |
foreach_oid(colocatedTableId, colocatedTableList) | |
{ | |
EnsureTableOwner(colocatedTableId); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what we actually want to check is the move list itself. Otherwise we would be throwing the logical replication error even for the balanced tables (if they don't have a replica identity or primary key), but we actually were not going to move their shards. If we move the check to somewhere before planning the moves, then we might even throw that error instead of "no moves available" notice message. I think we should do the same for the background rebalancer. We should call VerifyTablesHaveReplicaIdentity
for each element in placementUpdateList
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does seem useful to indicate to users that their future rebalances will fail by throwing the error even when there are no moves, because otherwise they will find out when they actually do need to rebalance. I don't feel strongly about this though.
Codecov Report
@@ Coverage Diff @@
## main #6754 +/- ##
========================================
Coverage 93.16% 93.16%
========================================
Files 260 259 -1
Lines 56103 55924 -179
========================================
- Hits 52267 52102 -165
+ Misses 3836 3822 -14 |
925a70b
to
92779b9
Compare
@JelteF @marcocitus shall we merge this? |
DESCRIPTION: Check before logicalrep for rebalancer, error if needed
Check if we can use logical replication or not, in case of shard transfer mode = auto, before executing the shard moves. If we can't, error out. Before this PR, we used to error out in the middle of shard moves:
Now we check and error out in the beginning, without moving the shards.
fixes: #6727