-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: rebalancing between stores on the same node fails #60545
Labels
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Comments
lunevalex
added
the
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
label
Feb 12, 2021
lunevalex
added a commit
to lunevalex/cockroach
that referenced
this issue
Feb 12, 2021
…plicas already exist on the same node Fixes cockroachdb#60545 The allocator in some cases allows for a range to have a replica on multiple stores of the same node. If that happens, it should allow itself to fix the situation by removing one of the offending replicas. This was only half working due to an ordering problem in how the replicas appeared in the descriptor. It could remove the first replica, but not the second one. . Release note: None
craig bot
pushed a commit
that referenced
this issue
Feb 16, 2021
59865: sql: add schema_name,table_id to crdb_internal.ranges r=rafiss a=jordanlewis ... and crdb_internal.ranges_no_leases Closes #59601. This commit adds schema_name to crdb_internal.ranges and crdb_internal.ranges_no_leases to ensure that it's possible to disambiguate between ranges that are contained by a table with the same name in two different user-defined schemas. In addition, it also adds the table_id column which allows unambiguous lookups of ranges for a given table id. This will also enable making a virtual index on the table_id column later, which should be a nice win for some introspection commands. Release note (sql change): add the schema_name and table_id columns to the crdb_internal.ranges and crdb_internal.ranges_no_leases virtual tables. 60546: kvserver: improve handling for removal of a replica, when multiple replicas already exist on the same node r=lunevalex a=lunevalex Fixes #60545 The allocator in some cases allows for a range to have a replica on multiple stores of the same node. If that happens, it should allow itself to fix the situation by removing one of the offending replicas. This was only half working due to an ordering problem in how the replicas appeared in the descriptor. It could remove the first replica, but not the second one. . Release note: None 60561: geo/wkt: simplify parser grammar and improve error messages r=otan a=andyyang890 This patch simplifies the yacc grammar for the WKT parser and also improves the error messages for mixed dimensionality problems. Refs: #53091 Release note: None Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com> Co-authored-by: Alex Lunev <alexl@cockroachlabs.com> Co-authored-by: Andy Yang <ayang@cockroachlabs.com>
lunevalex
added a commit
to lunevalex/cockroach
that referenced
this issue
Feb 17, 2021
…plicas already exist on the same node Fixes cockroachdb#60545 The allocator in some cases allows for a range to have a replica on multiple stores of the same node. If that happens, it should allow itself to fix the situation by removing one of the offending replicas. This was only half working due to an ordering problem in how the replicas appeared in the descriptor. It could remove the first replica, but not the second one. Release note (bug fix): 20.2 introduced an ability to rebalance replicas between multiple stores on the same node. This change fixed a problem with that feature, where ocassionaly an intra-node rebalance would fail and a range would get stuck permanently under replicated.
craig bot
pushed a commit
that referenced
this issue
Feb 17, 2021
60633: release-20.2: kvserver: improve handling for removal of a replica, when multiple replicas already exist on the same node r=aayushshah15 a=lunevalex Backport 1/1 commits from #60546. /cc @cockroachdb/release --- Fixes #60545 The allocator in some cases allows for a range to have a replica on multiple stores of the same node. If that happens, it should allow itself to fix the situation by removing one of the offending replicas. This was only half working due to an ordering problem in how the replicas appeared in the descriptor. It could remove the first replica, but not the second one. . Release note: None Co-authored-by: Alex Lunev <alexl@cockroachlabs.com>
4 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Describe the problem
This problem was reported here by @dankinder https://forum.cockroachlabs.com/t/under-replicated-ranges-after-decommission/4239/3. A range has the following descriptor (n1,s5):1, (n18,s51):2, (n7,s20):3, (n1,s2):4LEARNER and the allocator attempts to remove (n1, s2) but it fails. This is a valid operation and should be allowed.
To Reproduce
This has been reproduced in TestValidateReplicationChanges, by reversing the order of removal operations in Test Case 14.
Expected behavior
The removal of the replica on (n1, s2) should be allowed, as it returns the cluster to a healthy state.
The text was updated successfully, but these errors were encountered: