kvserver: unxpected replication change from 3 to 2 voters #64064

tbg · 2021-04-22T13:02:28Z

Describe the problem

When @aliher1911 and I looked into unhappy restore2TB runs in the context of #61396, we noticed unexpected replication changes. From a three-voter configuration, for some reason we're demoting a voter:

[n7,s7,r39158/4:x] change replicas (add ‹[]› remove ‹[(n5,s5):1VOTER_DEMOTING_LEARNER]›): existing descriptor r39158:‹/Table/54/1/52061454{-/0}› [(n5,s5):1, (n7,s7):4, (n10,s10):3, next=5, gen=744, sticky=1617706884.328040814,0]

At that point, in the full snippet, the range goes unavailable because one of the two remaining voters (s10) is waiting for a snapshot.

To Reproduce

Run the restore2TB/nodes=10 roachtest.

This should reproduce on any SHA preceding #64060, such as
d85d49d, when running restore2TB.
It may not always happen but we saw it frequently, at least in
"unhappy" runs (as characterized by large pending snapshot counts).

Expected behavior

With 10 live nodes and atomic replication changes, there should never be a
reason to move from a three-voter to a two-voter configuration.o The only
explanation I have is that n5 might have been considered dead for 5 minutes
which would possibly trigger this issue (?!) but this is esssentially ruled out
by the full snippet, which indicates that n5 was live a minute after the
botched replication change (and it is thus unlikely to have been non-live for
the preceding minutes).

@aliher1911 if you have full logs from any of these experiments, mind going
through them to see if you have other examples of such replication changes,
and if so, posting the complete log directory (Google Drive).

Additional data / screenshots

Environment:

Additional context
Add any other context about the problem here.

cc @cockroachdb/kv

The text was updated successfully, but these errors were encountered:

aliher1911 · 2021-04-22T18:31:44Z

I have full logs from that run still and I'll check if other runs have it just in case. Will upload later.

tbg · 2021-04-23T06:12:33Z

Another instance observed by David in https://cockroachlabs.slack.com/archives/C0KB9Q03D/p1619132965163200?thread_ts=1619132965.163200&cid=C0KB9Q03D

aayushshah15 · 2021-04-23T06:13:17Z

I could reproduce this scenario pretty readily with a log.Fatal inside changeReplicasImpl when it was about to execute such a change. @dt also noticed this today and, on his cluster, the range log for the unfortunate range looked like the following (notice the 4th item from the top):

It seemed to indicate that the caller of ChangeReplicas (whoever it was) was explicitly asking it to remove just that voter. The Reason: admin request part is also important because it tells us that it cannot be the allocator somehow making an incorrect decision, as the allocator would indicate something like range over-replicated as the Reason.

This pointed fingers at the callers of AdminRelocateRange (or a bug inside AdminRelocateRange itself, but I ruled this out). Sure enough, it turns out it's the store rebalancer calling AdminRelocateRange with slices that contain duplicate targets:

rebalancing r44266 (0.09 qps) from (n9,s9):1,(n5,s5):2,(n4,s4):3 to ‹[n10,s10 n8,s8 n8,s8]› to better balance load

I see what the bug is, it was introduced in #61600. I'm misusing a slice (:sweat:). I'll get a fix out for it asap.

Tobi, do you think we should mark this a release blocker until then?

Another unfortunate (but not "new") thing is that because of this check, the store rebalancer seems to let itself relocate replicas with very low QPS (notice the QPS in my example^). This doesn't seem "bad" per se, because presumably the replicate queue would eventually do the same thing, but feels to me like the store rebalancer should not be concerning itself with such a thing.

aliher1911 · 2021-04-23T10:41:26Z

Logs from the run where we observed this behaviour: drive link
@tbg @aayushshah15

tbg · 2021-04-23T10:53:32Z

Very nice, @aayushshah15! I'll mark this as a GA-blocker. It could well be a release blocker too - someone using a beta to test resilience might get miffed - but I think that is less than likely and if so should be easy to pin down if they contact us.

Prior to this commit, we had a bug inside one of the methods used by the `StoreRebalancer` where we had two variables referring to a slice that was subsequently being appended to. The code was making the implicit assumption that both of these slices would point to the same underlying array, which is not true since any of the additions mentioned above could result in the underlying array for one of the slices being resized. This commit fixes this bug. Resolves cockroachdb#64064 Release note (bug fix): A bug in previous 21.1 betas allowed the store rebalancer to spuriously downreplicate a range during normal operation. This bug is now fixed.

64170: kvserver: prevent the StoreRebalancer from downreplicating a range r=aayushshah15 a=aayushshah15 This PR contains 2 commits: **kvserver: add safeguard against spurious calls to AdminRelocateRange** This commit adds checks inside of `AdminRelocateRange` to fail if the lists of relocation targets passed in by the caller contain duplicates. This is supposed to act as a safeguard against catastrophic outcomes like a range getting spuriously downreplicated due to malformed input. Release note: None **kvserver: prevent the StoreRebalancer from downreplicating a range** Prior to this commit, we had a bug inside one of the methods used by the `StoreRebalancer` where we had two variables referring to a slice that was subsequently being appended to. The code was making the implicit assumption that both of these slices would point to the same underlying array, which is not true since any of the additions mentioned above could result in the underlying array for one of the slices being resized. This commit fixes this bug. Resolves #64064 Release note (bug fix): A bug in previous 21.1 betas allowed the store rebalancer to spuriously downreplicate a range during normal operation. This bug is now fixed. Co-authored-by: Aayush Shah <aayush.shah15@gmail.com>

Prior to this commit, we had a bug inside one of the methods used by the `StoreRebalancer` where we had two variables referring to a slice that was subsequently being appended to. The code was making the implicit assumption that both of these slices would point to the same underlying array, which is not true since any of the additions mentioned above could result in the underlying array for one of the slices being resized. This commit fixes this bug. Resolves cockroachdb#64064 Release note (bug fix): A bug in previous 21.1 betas allowed the store rebalancer to spuriously downreplicate a range during normal operation. This bug is now fixed.

Previously, the `StoreRebalancer` would rebalance ranges off of a node if it had more than the average number of replicas compared to the cluster average. This was undesirable because the `StoreRebalancer` was not respecting the {over,under}fullness thresholds that the replicateQueue does, which could lead to redundant range relocations even in a well balanced cluster. This commit is an opinionated fix to prevent this behavior and an attempt to more clearly delineate the responsibilities of the replica rebalancing aspects of the `StoreRebalancer` and the `replicateQueue`. Noticed while looking into cockroachdb#64064. Release note: None

@miretskiy

64251: changefeedccl: support arrays in avro encoding r=HonoreDB a=HonoreDB Arrays are one of a few remaining types we don't support in avro changefeeds. This serializes them. Sadly can't use @miretskiy's new optimization since we need to load multiple maps with the same key into an array. Release note (bug fix): Changefeeds on tables with array columns support avro 64559: kvserver: stop reconciling range count imbalance in the StoreRebalancer r=aayushshah15 a=aayushshah15 Previously, the `StoreRebalancer` would rebalance ranges off of a node if it had more than the average number of replicas compared to the cluster average. This was undesirable because the `StoreRebalancer` was not respecting the {over,under}fullness thresholds that the replicateQueue does, which could lead to redundant range relocations even in a well balanced cluster. This commit is an opinionated fix to prevent this behavior and an attempt to more clearly delineate the responsibilities of the replica rebalancing aspects of the `StoreRebalancer` and the `replicateQueue`. Noticed while looking into #64064. Release note: None 64614: ui: update cluster-ui version r=maryliag a=maryliag Update cluster-ui version to include changes made on cockroachdb/ui#294 and minor fix to handle case when no column is selected. Release note (ui change): cluster-ui updated. Showing information about database on Statement Page and ability to choose which columns to display. 64664: authors: add charlie to authors r=charlievieth a=charlievieth Release note: none 64665: authors: add Wade Waldron to Authors r=jlinder a=WadeWaldron Release note: None 64666: authors: add bryan@cockroachlabs.com to authors r=jlinder a=strangr525 release note: none 64668: authors: add theodore hyman to authors r=jlinder a=theodore-hyman release note: None Co-authored-by: Aaron Zinger <zinger@cockroachlabs.com> Co-authored-by: Aayush Shah <aayush.shah15@gmail.com> Co-authored-by: Marylia Gutierrez <marylia@cockroachlabs.com> Co-authored-by: Charlie Vieth <charlie@cockroachlabs.com> Co-authored-by: Wade Waldron <wade@cockroachlabs.com> Co-authored-by: Bryan Kwon <bryan@cockroachlabs.com> Co-authored-by: Theodore HymHyman <theodore@cockroachlabs.com>

tbg added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-kv-replication Relating to Raft, consensus, and coordination. labels Apr 22, 2021

tbg added branch-master Failures and bugs on the master branch. branch-release-21.1 GA-blocker labels Apr 23, 2021

aayushshah15 mentioned this issue Apr 24, 2021

kvserver: prevent the StoreRebalancer from downreplicating a range #64170

Merged

aayushshah15 mentioned this issue Apr 27, 2021

release-21.1: kvserver: add safeguard against spurious calls to AdminRelocateRange #64303

Merged

craig bot closed this as completed in b00f923 Apr 27, 2021

aayushshah15 mentioned this issue May 3, 2021

kvserver: stop reconciling range count imbalance in the StoreRebalancer #64559

Merged

tbg mentioned this issue May 20, 2021

kvserver: prevent StoreRebalancer from downreplicating #64650

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: unxpected replication change from 3 to 2 voters #64064

kvserver: unxpected replication change from 3 to 2 voters #64064

tbg commented Apr 22, 2021

aliher1911 commented Apr 22, 2021

tbg commented Apr 23, 2021

aayushshah15 commented Apr 23, 2021

aliher1911 commented Apr 23, 2021

tbg commented Apr 23, 2021

kvserver: unxpected replication change from 3 to 2 voters #64064

kvserver: unxpected replication change from 3 to 2 voters #64064

Comments

tbg commented Apr 22, 2021

aliher1911 commented Apr 22, 2021

tbg commented Apr 23, 2021

aayushshah15 commented Apr 23, 2021

aliher1911 commented Apr 23, 2021

tbg commented Apr 23, 2021