-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADD/DROP REGION stuck if node restarts (or is restored) with different --locality
flag
#113324
Comments
--locality
flag
@Xiang-Gu can you provide step-by-step instructions on how to reproduce this? Then we can determine exactly when the regression occurred, and assign this to a different team if we need to. |
--locality
flag--locality
flag
Definitely!
|
Sharing a similar example from @smcvey
|
If we have a MR cluster and one of the database,
db
, hasoldRegion
set as its primary region, what would happen if we stop the node(s) and restart them with a different--locality
flag?We should allow such behavior and when it comes to the zone configuration of
db
, it still has its old configurations, includingconstraints=[region=oldRegion]
. We end up with an unfulfillable allocation constraint and the allocator falls back to putting replicas wherever it can. However, if we were to add one of the new regions todb
, such a ADD REGION will get stuck because there are validation logic that complains about this unfulfillable constraint, something likeThis is a regression compared to v21.x or whenever this worked.
One solution is to relax the validation to only regions proposed to be added (not but existing one as it's possible and fine they don't exist in such a scenario).
Also, we run into exactly the same scenario (and problems) when we restore into a cluster with different regions than the ones when the backup is taken.
Discovered from this escalation, in which we included one workaround.
See slack thread https://cockroachlabs.slack.com/archives/C2C5FKPPB/p1698680009075359 for more details.
Jira issue: CRDB-32870
Epic CRDB-33032
The text was updated successfully, but these errors were encountered: