Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: add (default-off) atomic replication changes #39936

Merged
merged 22 commits into from
Aug 26, 2019

Conversation

tbg
Copy link
Member

@tbg tbg commented Aug 23, 2019

This PR contains a series of commits that first pave for the way and ultimately
allow carrying out atomic replication changes via Raft joint consensus.

Atomic replication changes are required to avoid entering unsafe configurations
during lateral data movement. See #12768 for details; this is a problem we want
to address in 19.2.

Before merging this we'll need to sort out an upstream change in Raft which
has made a bug in our code related to learner snapshots much more likely; the
offending upstream commit is patched out of the vendored etcd bump in this PR
at the time of writing.

An antichronological listing of the individual commits follows. They should be
reviewed individually, though it may be helpful to look at the overall diff for
overall context. A modest amount of churn may exist between the commits, though
a good deal of effort went into avoiding this.

storage: allow atomic replication changes in ChangeReplicas

They default to OFF.

This needs a lot more tests which will be added separately in the course of
switching the default to ON and will focus on the interactions of joint
states with everything else in the system.

We'll also need another audit of consumers of the replica descriptors to
make sure nothing was missed in the first pass.

Release note: None

storage: fix replicaGCQueue addition on removal trigger

Once we enter joint changes, the replica to be removed will show up in
`crt.Removed()` when the joint state is entered, but it only becomes
eligible for actual removal when we leave the joint state later. The new
code triggers at the right time, namely when the replica is no longer in
the descriptor.

Release note: None

storage: let execChangeReplicasTxn construct the descriptor

Prior to this commit, the method took both an old and a new desc *plus*
slices of added and removed replicas. This had grown organically, wasn't an
easily understood interface, led to repetitive and tricky code at the
callers, and most importantly isn't adequate any more in a world with
atomic replication changes, where execChangeReplicasTxn in constructing the
ChangeReplicasTrigger is essentially deciding whether a joint configuration
needs to be entered (which in turn determines what the descriptor needs to
look like in the first place). To start solving this, let
execChangeReplicasTxn create (and on success return) the new descriptor.
Callers instead pass in what they want to be done, which is accomplished
via an []internalReplicationChange slice.

Release note: None

roachpb: auto-assign ReplicaID during AddReplica

This is a cleanup leading up to a larger refactor of the contract around
`execChangeReplicasTxn`.

Release note: None

storage: emit ConfChangeV2 from ChangeReplicasTrigger where appropriate

This prepares the trigger -> raft translation code to properly handle
atomic replication changes.

This carries out a lot of validation to give us confidence that any unusual
transitions would be caught quickly.

This change also establishes more clearly which added and removed replicas
are to be passed into the trigger when transitioning into a joint
configuration. For example, when adding a voter, one technically replaces a
Learner with a VoterIncoming and so the question is which type the replica
in the `added` slice should have.  Picking the Learner would give the
trigger the most power to validate the input, but it's annoying to have
divergent descriptors floating around, so by convention we say that it is
always the updated version of the descriptor (i.e. for fully removed
replicas, just whatever it was before it disappeared). I spent more time on
this than I'm willing to admit, in particular looking removing the
redundancy here, but it made things more awkward than was worth it.

Release note: None

storage: push replication change unrolling into ChangeReplicas

There are various callers to ChangeReplicas, so it makes more sense to
unroll at that level. The code was updated to - in principle - do the right
thing when atomic replication changes are requested, except that they are
still unimplemented and a fatal error will serve as a reminder of that. Of
course nothing issues them yet.

Release note: None

storage: skip ApplyConfChange on rejected entry

When in a joint configuration, passing an empty conf change to
ApplyConfChange doesn't do the right thing any more: it tells Raft that
we're leaving the joint config. It's not a good idea to try to tell Raft
anything about a ConfChange that got rejected. Raft internally knows that
we handled it because it knows the applied index.

This also adds a case match for ConfChangeV2 which is necessary to route
atomic replication changes (ConfChangeV2).

See https://github.com/etcd-io/etcd/pull/11046

Release note: None

storage: un-embed decodedConfChange

I ate a number of NPEs during development because nullable embedded fields
are tricky; they hide the pointer derefs that often need a nil check. We'll
embed the fields of decodedConfChange instead which works out better. This
commit also adds the unmarshaling code necessary for ConfChangeV2 needed
once we issue atomic replication changes.

Release note: None

storage: add learners one by one

Doing more than one change at once is going to force us into an atomic
replication change. This isn't crazy, but seems unnecessary at this point,
so just add the learners one by one.

Release note: None

storage: add fatals where atomic conf changes are unsupported

These will be upgraded with proper handling when atomic replication changes
are actually introduced, but for now it's convenient to stub out some code
that will need to handle them and to make sure we won't forget to do so
later.

Release note: None

storage: add atomic replication changes cluster setting

This defaults to false, and won't have an effect unless the newly
introduced cluster version is also active.

Release note: None

roachpb: support zero-change ChangeReplicasTrigger

We will use a ChangeReplicasTrigger without additions and removals when
transitioning out of a joint configuration, so make sure it supports this
properly.

Release note: None

roachpb: return "desired" voters from ReplicaDescriptors.Voters

Previous commits introduced (yet unused) voter types to encode joint
consensus configurations which occur during atomic replication changes.

Access to the slice of replicas is unfortunately common, though at least
it's compartmentalized via the getters Voters() and Learners().

The main problem solved in this commit is figuring out what should be
returned from Voters(): is it all VoterX types, or only voters in one of
the two majority configs part of a joint quorum?

The useful answer is returning the set of voters corresponding to what the
config will be once the joint state is exited; this happens to be what most
callers care about. Incoming and full voters are really the same thing in
our code; we just need to distinguish them from outgoing voters to
correctly maintain the quorum sizes.

Of course there are some callers that do care about quorum sizes, and a
number of cleanups were made for them.

This commit also adds a ReplicaDescriptors.ConfState helper which is then
used in all of the places that were previously cobbling together a
ConfState manually.

Release note: None

roachpb: add ReplicaType_Voter{Incoming,Outgoing}

These are required for atomic replication changes to describe joint
configurations, i.e. configurations consisting of two sets of replica which
both need to reach quorum to make replication decisions.

An audit of existing consumers of this enum will follow.

Release note: None

roachpb: rename ReplicaType variants

The current naming is idiomatic for proto enums, but atypical for its usage
in Go code. There is no `(gogoproto.customname)` that can fix this, and
we're about to add more replica types that would require awkward names such
as `roachpb.ReplicaType_VOTER_OUTGOING`.

Switch to a Go-friendly naming scheme instead.

Release note: None

batcheval: generalize checkNotLearnerReplica

This now errors out whenever the replica is not a voter, which is more
robust as new replica types are introduced (which generally should not
automatically become eligible to receive leases).

Release note: None

roachpb: improve RangeDescriptor.Validate

Make sure there isn't more than one replica per store.

Release note: None

roachpb: generalize ReplicaDescriptor.String()

The new code will generalize to new replica types.

Release note: None

[dnm] vendor: bump raft

This picks up upstream fixes related to atomic membership changes.

I had to smuggle in a small hack because we're picking up
https://github.com/etcd-io/etcd/pull/11037 which makes a race between the
snapshot queue and the proactive learner snapshot much more likely, and
this in turn makes tests quite flaky because it turns out that if the
learner snap loses, it can actually error out.

Release note: None

storage: avoid fatal error from splitPostApply

This is the next band-aid on top of #39658 and #39571. The descriptor
lookup I added sometimes fails because replicas can process a split trigger
in which they're not a member of the range:

> F190821 15:14:28.241623 312191 storage/store.go:2172
> [n2,s2,r21/3:/{Table/54-Max}] replica descriptor of local store not
> found in right hand side of split

I saw this randomly in `make test PKG=./pkg/ccl/partitionccl`.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@tbg
Copy link
Member Author

tbg commented Aug 23, 2019

@nvanbenschoten sorry to drop that kind of review bomb on you, but hope the individual commits make it more manageable. If you spot anything that you'd like to have pulled out and reviewed separately, I'm happy to do so. At the same time, as you review, please point out commits that you're 100% ok with -- my plan for when this review hits substantial comments is to pull out and land everything that's uncontroversial and repeat that until nothing's left.

@awoods187
Copy link
Contributor

Wow--looks like a lot of hard work went into this. Excited to test it out and see it in action.

@tbg tbg force-pushed the atomic/changes-port branch 2 times, most recently from 6b3f036 to 66fd3af Compare August 23, 2019 15:23
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made it up through d658d21 (r10) so far. I'm ignoring storage: avoid fatal error from splitPostApply (#39796) since it looks like it's being discussed elsewhere.

Commits that LGTM:

  • 7a07ce0: roachpb: generalize ReplicaDescriptor.String() (mod nits)
  • eb7d146: roachpb: improve RangeDescriptor.Validate (mod nits)
  • 31fbc37: batcheval: generalize checkNotLearnerReplica
  • 334cf0d: roachpb: rename ReplicaType variants (once we're ok on naming)
  • 2af6f9e: roachpb: add ReplicaType_Voter{Incoming,Outgoing}

I don't see any real blockers in the other commits, but they need one more round of reviews.

I'm continuing to review but wanted to let you pipeline some of the work.

Reviewed 1 of 1 files at r1, 3 of 3 files at r2, 1 of 1 files at r3, 1 of 1 files at r4, 3 of 3 files at r5, 18 of 18 files at r6, 16 of 16 files at r7, 7 of 7 files at r8, 4 of 4 files at r9, 2 of 2 files at r10.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)


Gopkg.toml, line 42 at r2 (raw file):

  name = "go.etcd.io/etcd"
  # branch = "master"
  branch = "crl-bump"

Is the need for this fixed by #39796? Are they related?


pkg/roachpb/data.go, line 1347 at r9 (raw file):

// configuration.
func (crt ChangeReplicasTrigger) LeaveJoint() bool {
	return len(crt.Added())+len(crt.Removed()) == 0

Is it worth making this implicit instead of adding a new field to ChangeReplicasTrigger?


pkg/roachpb/data.go, line 1369 at r9 (raw file):

		// TODO(tbg): could list the replicas that will actually leave the
		// voter set.
		fmt.Fprintf(&chgS, "LEAVE_JOINT")

Are you missing a space at the end of this?

EDIT: nope. // NB: not missing a space. might actually be useful.


pkg/roachpb/data.proto, line 163 at r9 (raw file):

// methods return the replicas being added and removed, respectively. If more than
// one change is specified (i.e. len(Added())+len(Removed()) exceeds one), this
// initiates an atomic replication change in which the "removed" replicas are

Same question about whether it's worth introducing the term "atomic replication change" in code when "joint replication change" is already all over.

Maybe you're saying that they aren't synonymous and that "atomic replication change" is a collection of individual replication changes that are made atomic through joint consensus?


pkg/roachpb/metadata.go, line 282 at r3 (raw file):

	}
	if typ := r.GetType(); typ != ReplicaType_VOTER {
		buf.WriteString(strings.ToUpper(typ.String()))

Do we need the strings.ToUpper? It looks like this should already be correctly cased.

EDIT: never mind, addressed later.


pkg/roachpb/metadata.go, line 232 at r4 (raw file):

		}
		seen[rep.ReplicaID] = struct{}{}
		if rep.ReplicaID >= r.NextReplicaID {

tiny nit: move this check up so the ReplicaID re-use check is next to the StoreID re-use check.


pkg/roachpb/metadata.proto, line 39 at r6 (raw file):

// ReplicaType identifies which raft activities a replica participates in.
enum ReplicaType {

Did you consider doing something with option (gogoproto.goproto_enum_prefix) = false; instead? This does seem to be going against the convention in our code.


pkg/roachpb/metadata.proto, line 53 at r7 (raw file):

// to replicas 1 and 2, while 3 is VoterOutgoing and 4 is VoterIncoming, then
// the two sets over which quorums need to be achieved are {1,2,3} and {1,2,4}.
// Thus, {1,2} is a quorum of both, {1,3} is a quorum of the first but not the

nit: spell out the last part for clarity "but not the second, {1,4} is a quorum of the second but not the first, and"


pkg/roachpb/metadata.proto, line 76 at r7 (raw file):

  // CockroachDB are a short-term transient state: a replica being added and on
  // its way to being a VOTER.
  Learner = 1;

We don't need a LearnerIncoming state to mirror Config.LearnersNext because there's no need to distinguish the two in Cockroach code? That's true even though raft.ConfState splits them out?

Never mind - answered later on.


pkg/roachpb/metadata_replicas.go, line 271 at r8 (raw file):

}

// InAtomicReplicationChange returns true if the descriptor is in the middle of

Do you think it's worth introducing "atomic replication change" as a term synonymous with "joint replication change"? Is one more descriptive than the other? Is it worth the confusion?


pkg/roachpb/metadata_replicas.go, line 321 at r8 (raw file):

// replication layer. This is more complicated than just counting the number
// of replicas due to the existence of joint quorums.
func (d ReplicaDescriptors) CanMakeProgress(liveFunc func(descriptor ReplicaDescriptor) bool) bool {

I think this deserves a test.


pkg/roachpb/metadata_replicas.go, line 325 at r8 (raw file):

	var c int
	if n := len(d.wrapped); len(voters) == n {
		// Fast path when there are only full voters, i.e. the common case.

It took me a little to figure out why this fast-path wasn't the same as a check for !d.InAtomicReplicationChange(). Mind spelling out the two cases it can be hit in.

  1. non-joint quorum
  2. joint quorum with only replica addition

pkg/roachpb/metadata_replicas.go, line 345 at r8 (raw file):

	votes := make(map[uint64]bool, len(d.wrapped))
	for _, rDesc := range d.wrapped {
		votes[uint64(rDesc.ReplicaID)] = true

Should this be votes[uint64(rDesc.ReplicaID)] = liveFunc(rDesc)?


pkg/roachpb/metadata_replicas_test.go, line 160 at r8 (raw file):

		// the config exactly as described by the descriptor so we don't try.
		{
			[]ReplicaDescriptor{rd(l, 1), rd(v0, 2), rd(vi, 3)},

Given the description, should this be rd(v, 2)?


pkg/roachpb/metadata_replicas_test.go, line 166 at r8 (raw file):

		// set of voters n2, n4 (plus learner n1 before and after).
		{
			[]ReplicaDescriptor{rd(l, 1), rd(v0, 2), rd(vi, 3), rd(vo, 4)},

Given the description, should this also be rd(v, 2)? I must be missing something.


pkg/settings/cluster/cockroach_versions.go, line 525 at r10 (raw file):

	},
	{
		// VersionAtomicChangeReplicas is TODO(tbg)

Reminder to address this TODO.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commits that LGTM:

  • efda358: storage: add learners one by one
  • c9c39d3: storage: un-embed decodedConfChange
  • 1fbdcc1: storage: skip ApplyConfChange on rejected entry
  • 5ebb3e1: storage: push replication change unrolling into ChangeReplicas (mod note about unused args)
  • 27bd59d: storage: fix replicaGCQueue addition on removal trigger

Thanks for making this digestible.

Also, I think you're missing a change to handle raftpb.EntryConfChangeV2 in pkg/storage/debug_print.go.

Reviewed 3 of 3 files at r11, 1 of 1 files at r12, 2 of 2 files at r13, 1 of 1 files at r14, 2 of 2 files at r15, 5 of 5 files at r16, 3 of 3 files at r17, 1 of 1 files at r18, 1 of 1 files at r19, 5 of 5 files at r20.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg)


pkg/roachpb/data.go, line 1440 at r16 (raw file):

		case ReplicaType_VoterFull, ReplicaType_Learner:
			// A learner or full voter can't be in the desc after.
			for _, rd := range replicas {

nit: consider pushing this into a checkNotExists helper for symmetry.


pkg/roachpb/data.go, line 1450 at r16 (raw file):

		sl = append(sl, raftpb.ConfChangeSingle{
			NodeID: uint64(rDesc.ReplicaID),
			Type:   raftpb.ConfChangeRemoveNode,

nit: keep these in the same order as above.


pkg/roachpb/data.go, line 1466 at r16 (raw file):

		}
	}
	wantLeaveJoint := len(added)+len(removed) == 0

Didn't we already define these somewhere (InAtomicReplicationChange, LeaveJoint)? Is this what you meant by having trouble avoiding all duplication?


pkg/roachpb/data.go, line 1532 at r16 (raw file):

		fmt.Fprintf(&chgS, "<malformed ChangeReplicasTrigger: %s>", err)
	} else {
		if cc.AsV2().LeaveJoint() {

Do we need ChangeReplicasTrigger.EnterJoint and ChangeReplicasTrigger.LeaveJoint anymore?


pkg/roachpb/data_test.go, line 1730 at r16 (raw file):

	}{
		// A replica of type VoterOutgoing being added makes no sense.
		{crt: mk(false, vo1, nil, vo1), err: "can't add replica in state VoterOutgoing"},

nit: these would be I little easier to read if you defined:

v1 := false
v2 := true

Or define a config struct instead of the mk method. I found myself jumping back and forth when reading these cases. This would be clearer as:

{
    v2: false,
    add: vo1,
    del: nil,
    repls: vo1,
    err: ...
}

pkg/roachpb/data_test.go, line 1732 at r16 (raw file):

		{crt: mk(false, vo1, nil, vo1), err: "can't add replica in state VoterOutgoing"},
		// But an incoming one can be added, and the result must be a joint change.
		{crt: mk(false, vi1, nil, sl(ReplicaType_VoterIncoming, 1)), exp: raftpb.ConfChangeV2{

s/sl(ReplicaType_VoterIncoming, 1)/vi1/?

And below?


pkg/roachpb/metadata.go, line 145 at r17 (raw file):

) (ReplicaDescriptor, bool) {
	for i := range r.InternalReplicas {
		if r.InternalReplicas[i].StoreID == storeID && r.InternalReplicas[i].NodeID == nodeID {

nit:

desc := &r.InternalReplicas[i]
if desc.NodeID == nodeID && desc.StoreID == storeID {
    desc.Type = &typ
    return *desc, true
}

pkg/storage/allocator.go, line 345 at r11 (raw file):

		return AllocatorRemoveLearner, removeLearnerReplicaPriority
	}
	if desc.Replicas().InAtomicReplicationChange() {

Does this need to go above the if learners := desc.Replicas().Learners(); len(learners) > 0 { case, given their relative priorities?


pkg/storage/client_atomic_membership_change_test.go, line 91 at r20 (raw file):

					return errors.Errorf("diff(want, have):\n%s", strings.Join(diff, "\n"))
				}
				// Check that conf state is up to date.	This can fail even though

Strange tab.


pkg/storage/replica.go, line 1332 at r15 (raw file):

	case *roachpb.AdminChangeReplicasRequest:
		chgs := tArgs.Changes()
		desc, err := r.ChangeReplicas(ctx, &tArgs.ExpDesc, SnapshotRequest_REBALANCE, storagepb.ReasonAdminRequest, "", chgs)

Should we return AdminChangeReplicasResponse, roachpb.Error from this function now, like we do with all of the other admin request handlers?


pkg/storage/replica_command.go, line 924 at r15 (raw file):

	}
	// Atomic replication change.
	return r.changeReplicasImpl(ctx, desc, SnapshotRequest_REBALANCE, storagepb.ReasonAdminRequest, "", chgs)

We're just ignoring the provided reason, priority, and details?


pkg/storage/replica_command.go, line 954 at r18 (raw file):

			return nil, errors.Errorf("need exactly one change, got %+v", chgs)
		}
		if chgs[0].ChangeType == roachpb.ADD_REPLICA {

I know this isn't being touched here, but it would be more immediately obvious what was going on if you did:

switch chgs[0].ChangeType {
case roachpb.ADD_REPLICA:
    ...
case roachpb.REMOVE_REPLICA:
    ...
default:
    ...
}

pkg/storage/replica_command.go, line 1252 at r18 (raw file):

const (
	internalChangeTypeAddVoterViaPreemptiveSnap = iota
	internalChangeTypeAddLearner                = iota

Remove the repeated iotas.


pkg/storage/replica_command.go, line 1257 at r18 (raw file):

)

type internalReplicationChange struct {

Leave a small comment about why this is different than all the other enums we have floating around.


pkg/storage/replica_command.go, line 1268 at r18 (raw file):

	reason storagepb.RangeLogEventReason,
	details string,
	chgs []internalReplicationChange,

This is a nice change.


pkg/storage/replica_command.go, line 582 at r20 (raw file):

			return errors.Errorf("ranges are not adjacent; %s != %s", origLeftDesc.EndKey, rightDesc.StartKey)
		}
		// For simplicity, don't handle learner replicas or joing states, expect

joint


pkg/storage/replica_command.go, line 987 at r20 (raw file):

}

func (r *Replica) maybeLeaveAtomicChangeReplicas(

Give this a comment.


pkg/storage/replica_command.go, line 1004 at r20 (raw file):

	}
	return execChangeReplicasTxn(
		ctx, r.store, desc, storagepb.ReasonUnknown /* unused */, "", nil, /* iChgs */

No reason, or detail?


pkg/storage/replica_command.go, line 1309 at r20 (raw file):

		for _, rDesc := range desc.Replicas().All() {
			switch rDesc.GetType() {
			case roachpb.ReplicaType_VoterIncoming, roachpb.ReplicaType_VoterOutgoing:

Don't we have a method for this?


pkg/storage/replica_command.go, line 1318 at r20 (raw file):

	var added, removed []roachpb.ReplicaDescriptor

	useJoint := len(chgs) > 1

nit: pull this entire loop into the if len(chgs) > 0 { condition and then make the if len(chgs) == 0 { and else case so that the structure of this function is more clear (i.e. it either does it change if any are provided or it transitions out of the joint state if no changes are provided).


pkg/storage/replica_proposal_buf.go, line 447 at r16 (raw file):

			}

			cc, err := crt.ConfChange(encodedCtx)

Nice!


pkg/storage/replica_proposal_buf.go, line 453 at r16 (raw file):

			}

			if err := raftGroup.ProposeConfChange(

Mind simplifying this to:

if err := raftGroup.ProposeConfChange(cc); err == raft.ErrProposalDropped {
    // Silently ignore dropped proposals (they were always silently
    // ignored prior to the introduction of ErrProposalDropped).
    // TODO(bdarnell): Handle ErrProposalDropped better.
    // https://github.com/cockroachdb/cockroach/issues/21849
} else if err != nil {
    firstErr = err
}

Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think you're missing a change to handle raftpb.EntryConfChangeV2 in pkg/storage/debug_print.go.

Done.

Thanks for the quick review. I addressed all of the "small" comments. It's not worth coming back to this review yet (if you had gotten all the way through), I'll ping when it's ready.
I'll avoid rebasing this or squashing anything to make this less messy. Catching up should be, hopefully, a breeze.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)


Gopkg.toml, line 42 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is the need for this fixed by #39796? Are they related?

No, they're independent. I need to open an issue about this one. Basically the upstream commit that lets the leader reach out to newly added followers right away instead of waiting a heartbeat makes a race between raft snapshots and learner snapshots much more likely, and tests get flaky as a result. Unfortunately we don't have a good minimal fix in our codebase yet, so I'm just going to avoid pulling in the upstream commit for now (will bump etcd separately before this PR merges)

edit: we settled (out of band) on basically the above, but making sure that we only use "atomic replication change" and "joint configuration" (i.e. no "joint replication change")

Haven't made edits yet.


pkg/roachpb/data.go, line 1347 at r9 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is it worth making this implicit instead of adding a new field to ChangeReplicasTrigger?

Yeah, this got lost in the churn. They're unused, removed.


pkg/roachpb/data.go, line 1466 at r16 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Didn't we already define these somewhere (InAtomicReplicationChange, LeaveJoint)? Is this what you meant by having trouble avoiding all duplication?

There's a little bit of duplication because the code that constructs the descriptor also needs to know whether it wants to go through a joint state. The code here needs to reverse engineer a bit of that, but it's also trying to make sure the caller didn't mess anything up. Note that the trigger is essentially validated before we put it into raft by having it construct the ConfState once.


pkg/roachpb/data.proto, line 163 at r9 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Same question about whether it's worth introducing the term "atomic replication change" in code when "joint replication change" is already all over.

Maybe you're saying that they aren't synonymous and that "atomic replication change" is a collection of individual replication changes that are made atomic through joint consensus?

Yeah, I don't think they're synonymous though I may not have been consistent and should go back and fix things up (will do once we agree)
In my view the "atomic replication change" is what takes you from your "before" descriptor to the desired "final" descriptor. Internally that change is not "atomic"; it takes you via the joint configuration, which I thus have to name, and if I have to name it, it's probably best to stick to the name that it has in Raft (plus the name is pretty good at describing what's going on). LMK what you think.


pkg/roachpb/data_test.go, line 1730 at r16 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: these would be I little easier to read if you defined:

v1 := false
v2 := true

Or define a config struct instead of the mk method. I found myself jumping back and forth when reading these cases. This would be clearer as:

{
    v2: false,
    add: vo1,
    del: nil,
    repls: vo1,
    err: ...
}

Used the struct, much better.


pkg/roachpb/metadata.proto, line 39 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Did you consider doing something with option (gogoproto.goproto_enum_prefix) = false; instead? This does seem to be going against the convention in our code.

Oh, I missed that option. Sounds reasonable, I will defer to the end though because that's not going to be a change that will want to freely rebase where it needs to go.


pkg/storage/allocator.go, line 345 at r11 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Does this need to go above the if learners := desc.Replicas().Learners(); len(learners) > 0 { case, given their relative priorities?

Fixed and reminder to add test added to #12768 (comment)


pkg/storage/replica.go, line 1332 at r15 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Should we return AdminChangeReplicasResponse, roachpb.Error from this function now, like we do with all of the other admin request handlers?

I wouldn't change this just for the sake of consistency, there isn't a single pErr->err conversion in ChangeReplicas and all of the usages except this one are pretty happy that they're getting a vanilla error. If anything I'd try to go the other way but there are good reasons for the others to return *roachpb.Error. All in all a yak for another day.


pkg/storage/replica_command.go, line 924 at r15 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We're just ignoring the provided reason, priority, and details?

Oh, not intentional. Fixed.


pkg/storage/replica_command.go, line 1252 at r18 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Remove the repeated iotas.

Heh, funny this even worked.


pkg/storage/replica_proposal_buf.go, line 453 at r16 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Mind simplifying this to:

if err := raftGroup.ProposeConfChange(cc); err == raft.ErrProposalDropped {
    // Silently ignore dropped proposals (they were always silently
    // ignored prior to the introduction of ErrProposalDropped).
    // TODO(bdarnell): Handle ErrProposalDropped better.
    // https://github.com/cockroachdb/cockroach/issues/21849
} else if err != nil {
    firstErr = err
}

I wrote this but unclear to me whether this is better since it diverges from the firstErr; continue pattern above. I think you have a point but I'll leave as is, I bet the linters will complain about the empty branch anyway.

Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it's ready for a look. I think I addressed all of the comments. Still haven't rebased or squashed to keep things simple.

Dismissed @nvanbenschoten from a discussion.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/roachpb/metadata_replicas.go, line 325 at r8 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

It took me a little to figure out why this fast-path wasn't the same as a check for !d.InAtomicReplicationChange(). Mind spelling out the two cases it can be hit in.

  1. non-joint quorum
  2. joint quorum with only replica addition

Done.


pkg/roachpb/metadata_replicas.go, line 345 at r8 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Should this be votes[uint64(rDesc.ReplicaID)] = liveFunc(rDesc)?

🤦‍♂️ this is now caught in the newly added tests.


pkg/roachpb/metadata_replicas_test.go, line 160 at r8 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Given the description, should this be rd(v, 2)?

v0 is just nil, which is treated like v. There's a test case just above your comment for that but I'm just going to use v here and also rename v0 to vn, maybe you're mistaking it for vo and I can't blame you.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 27 of 27 files at r21.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @tbg)


pkg/storage/debug_print.go, line 168 at r21 (raw file):

		return "", err
	}
	if ent.Type == raftpb.EntryNormal {

nit: might as well use a switch-case with a default arm that throws an error to catch this kind of stuff in the future.


pkg/storage/replica_command.go, line 1003 at r21 (raw file):

	for _, rDesc := range desc.Replicas().All() {
		switch rDesc.GetType() {
		case roachpb.VOTER_OUTGOING, roachpb.VOTER_INCOMING:

Shouldn't desc.InAtomicReplicationChange be able to tell you this?


pkg/storage/replica_proposal_buf.go, line 453 at r16 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I wrote this but unclear to me whether this is better since it diverges from the firstErr; continue pattern above. I think you have a point but I'll leave as is, I bet the linters will complain about the empty branch anyway.

We have the same style with the empty branch below. It's also fine to use the firstErr = err; continue pattern here, it's just not needed.

@tbg tbg force-pushed the atomic/changes-port branch 2 times, most recently from f59a8e4 to 3bb8d6b Compare August 26, 2019 20:09
Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)


pkg/storage/debug_print.go, line 168 at r21 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: might as well use a switch-case with a default arm that throws an error to catch this kind of stuff in the future.

It does that today-- the cleanup project here is to actually write proper tests for this. It does bother me a bit to walk away from this but I'm still going to do it.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 15 of 15 files at r22.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)

@tbg
Copy link
Member Author

tbg commented Aug 26, 2019

bors r=nvanbenschoten

tbg added 10 commits August 26, 2019 23:23
This picks up improvements related to joint consensus. We're pinning a
SHA that is close to master but not quite there to avoid tickling to a
bug in CRDB:

cockroachdb#40207

Release note: None
The new code will generalize to new replica types.

Release note: None
Make sure there isn't more than one replica per store.

Release note: None
This now errors out whenever the replica is not a voter, which is more
robust as new replica types are introduced (which generally should not
automatically become eligible to receive leases).

Release note: None
The current naming is idiomatic for proto enums, but atypical for its
usage in Go code. There is no `(gogoproto.customname)` that can fix
this, and we're about to add more replica types that would require
awkward names such as `roachpb.ReplicaType_VOTER_OUTGOING`.

Switch to a Go-friendly naming scheme instead.

Release note: None
These are required for atomic replication changes to describe joint
configurations, i.e. configurations consisting of two sets of replica
which both need to reach quorum to make replication decisions.

An audit of existing consumers of this enum will follow.

Release note: None
Previous commits introduced (yet unused) voter types to encode joint
consensus configurations which occur during atomic replication changes.

Access to the slice of replicas is unfortunately common, though at least
it's compartmentalized via the getters Voters() and Learners().

The main problem solved in this commit is figuring out what should be
returned from Voters(): is it all VoterX types, or only voters in one
of the two majority configs part of a joint quorum?

The useful answer is returning the set of voters corresponding to what
the config will be once the joint state is exited; this happens to be
what most callers care about. Incoming and full voters are really the
same thing in our code; we just need to distinguish them from outgoing
voters to correctly maintain the quorum sizes.

Of course there are some callers that do care about quorum sizes, and
a number of cleanups were made for them.

This commit also adds a ReplicaDescriptors.ConfState helper which is
then used in all of the places that were previously cobbling together a
ConfState manually.

Release note: None
We will use a ChangeReplicasTrigger without additions and removals when
transitioning out of a joint configuration, so make sure it supports
this properly.

Release note: None
This defaults to false, and won't have an effect unless the newly
introduced cluster version is also active.

Release note: None
These will be upgraded with proper handling when atomic replication
changes are actually introduced, but for now it's convenient to stub
out some code that will need to handle them and to make sure we won't
forget to do so later.

Release note: None
tbg added 12 commits August 26, 2019 23:23
Doing more than one change at once is going to force us into an atomic
replication change. This isn't crazy, but seems unnecessary at this
point, so just add the learners one by one.

Release note: None
I ate a number of NPEs during development because nullable embedded
fields are tricky; they hide the pointer derefs that often need a nil
check. We'll embed the fields of decodedConfChange instead which works
out better.
This commit also adds the unmarshaling code necessary for ConfChangeV2
needed once we issue atomic replication changes.

Release note: None
When in a joint configuration, passing an empty conf change to
ApplyConfChange doesn't do the right thing any more: it tells
Raft that we're leaving the joint config. It's not a good idea
to try to tell Raft anything about a ConfChange that got rejected.
Raft internally knows that we handled it because it knows the applied
index.

This also adds a case match for ConfChangeV2 which is necessary to
route atomic replication changes (ConfChangeV2).

See etcd-io/etcd#11046

Release note: None
There are various callers to ChangeReplicas, so it makes more sense to
unroll at that level. The code was updated to - in principle - do the
right thing when atomic replication changes are requested, except that
they are still unimplemented and a fatal error will serve as a reminder
of that. Of course nothing issues them yet.

Release note: None
This prepares the trigger -> raft translation code to properly handle
atomic replication changes.

This carries out a lot of validation to give us confidence that any
unusual transitions would be caught quickly.

This change also establishes more clearly which added and removed
replicas are to be passed into the trigger when transitioning into a
joint configuration. For example, when adding a voter, one technically
replaces a Learner with a VoterIncoming and so the question is which
type the replica in the `added` slice should have.  Picking the Learner
would give the trigger the most power to validate the input, but it's
annoying to have divergent descriptors floating around, so by convention
we say that it is always the updated version of the descriptor (i.e. for
fully removed replicas, just whatever it was before it disappeared).
I spent more time on this than I'm willing to admit, in particular
looking removing the redundancy here, but it made things more awkward
than was worth it.

Release note: None
This is a cleanup leading up to a larger refactor of the contract around
`execChangeReplicasTxn`.

Release note: None
Prior to this commit, the method took both an old and a new desc *plus*
slices of added and removed replicas. This had grown organically, wasn't
an easily understood interface, led to repetitive and tricky code at the
callers, and most importantly isn't adequate any more in a world with
atomic replication changes, where execChangeReplicasTxn in constructing
the ChangeReplicasTrigger is essentially deciding whether a joint
configuration needs to be entered (which in turn determines what the
descriptor needs to look like in the first place).
To start solving this, let execChangeReplicasTxn create (and on success
return) the new descriptor. Callers instead pass in what they want to
be done, which is accomplished via an []internalReplicationChange slice.

Release note: None
Once we enter joint changes, the replica to be removed will show up in
`crt.Removed()` when the joint state is entered, but it only becomes
eligible for actual removal when we leave the joint state later.
The new code triggers at the right time, namely when the replica is
no longer in the descriptor.

Release note: None
They default to OFF.

This needs a lot more tests which will be added separately in the course
of switching the default to ON and will focus on the interactions of
joint states with everything else in the system.

We'll also need another audit of consumers of the replica descriptors to
make sure nothing was missed in the first pass.

Release note: None
@craig
Copy link
Contributor

craig bot commented Aug 26, 2019

Canceled

@tbg
Copy link
Member Author

tbg commented Aug 26, 2019

bors r=nvanbenschoten

Had messed up Gopkg.lock

craig bot pushed a commit that referenced this pull request Aug 26, 2019
39936: storage: add (default-off) atomic replication changes r=nvanbenschoten a=tbg

This PR contains a series of commits that first pave for the way and ultimately
allow carrying out atomic replication changes via Raft joint consensus.

Atomic replication changes are required to avoid entering unsafe configurations
during lateral data movement. See #12768 for details; this is a problem we want
to address in 19.2.

Before merging this we'll need to sort out an upstream change in Raft which
has made a bug in our code related to learner snapshots much more likely; the
offending upstream commit is patched out of the vendored etcd bump in this PR
at the time of writing.

An antichronological listing of the individual commits follows. They should be
reviewed individually, though it may be helpful to look at the overall diff for
overall context. A modest amount of churn may exist between the commits, though
a good deal of effort went into avoiding this.

    storage: allow atomic replication changes in ChangeReplicas

    They default to OFF.

    This needs a lot more tests which will be added separately in the course of
    switching the default to ON and will focus on the interactions of joint
    states with everything else in the system.

    We'll also need another audit of consumers of the replica descriptors to
    make sure nothing was missed in the first pass.

    Release note: None

    storage: fix replicaGCQueue addition on removal trigger

    Once we enter joint changes, the replica to be removed will show up in
    `crt.Removed()` when the joint state is entered, but it only becomes
    eligible for actual removal when we leave the joint state later. The new
    code triggers at the right time, namely when the replica is no longer in
    the descriptor.

    Release note: None

    storage: let execChangeReplicasTxn construct the descriptor

    Prior to this commit, the method took both an old and a new desc *plus*
    slices of added and removed replicas. This had grown organically, wasn't an
    easily understood interface, led to repetitive and tricky code at the
    callers, and most importantly isn't adequate any more in a world with
    atomic replication changes, where execChangeReplicasTxn in constructing the
    ChangeReplicasTrigger is essentially deciding whether a joint configuration
    needs to be entered (which in turn determines what the descriptor needs to
    look like in the first place). To start solving this, let
    execChangeReplicasTxn create (and on success return) the new descriptor.
    Callers instead pass in what they want to be done, which is accomplished
    via an []internalReplicationChange slice.

    Release note: None

    roachpb: auto-assign ReplicaID during AddReplica

    This is a cleanup leading up to a larger refactor of the contract around
    `execChangeReplicasTxn`.

    Release note: None

    storage: emit ConfChangeV2 from ChangeReplicasTrigger where appropriate

    This prepares the trigger -> raft translation code to properly handle
    atomic replication changes.

    This carries out a lot of validation to give us confidence that any unusual
    transitions would be caught quickly.

    This change also establishes more clearly which added and removed replicas
    are to be passed into the trigger when transitioning into a joint
    configuration. For example, when adding a voter, one technically replaces a
    Learner with a VoterIncoming and so the question is which type the replica
    in the `added` slice should have.  Picking the Learner would give the
    trigger the most power to validate the input, but it's annoying to have
    divergent descriptors floating around, so by convention we say that it is
    always the updated version of the descriptor (i.e. for fully removed
    replicas, just whatever it was before it disappeared). I spent more time on
    this than I'm willing to admit, in particular looking removing the
    redundancy here, but it made things more awkward than was worth it.

    Release note: None

    storage: push replication change unrolling into ChangeReplicas

    There are various callers to ChangeReplicas, so it makes more sense to
    unroll at that level. The code was updated to - in principle - do the right
    thing when atomic replication changes are requested, except that they are
    still unimplemented and a fatal error will serve as a reminder of that. Of
    course nothing issues them yet.

    Release note: None

    storage: skip ApplyConfChange on rejected entry

    When in a joint configuration, passing an empty conf change to
    ApplyConfChange doesn't do the right thing any more: it tells Raft that
    we're leaving the joint config. It's not a good idea to try to tell Raft
    anything about a ConfChange that got rejected. Raft internally knows that
    we handled it because it knows the applied index.

    This also adds a case match for ConfChangeV2 which is necessary to route
    atomic replication changes (ConfChangeV2).

    See etcd-io/etcd#11046

    Release note: None

    storage: un-embed decodedConfChange

    I ate a number of NPEs during development because nullable embedded fields
    are tricky; they hide the pointer derefs that often need a nil check. We'll
    embed the fields of decodedConfChange instead which works out better. This
    commit also adds the unmarshaling code necessary for ConfChangeV2 needed
    once we issue atomic replication changes.

    Release note: None

    storage: add learners one by one

    Doing more than one change at once is going to force us into an atomic
    replication change. This isn't crazy, but seems unnecessary at this point,
    so just add the learners one by one.

    Release note: None

    storage: add fatals where atomic conf changes are unsupported

    These will be upgraded with proper handling when atomic replication changes
    are actually introduced, but for now it's convenient to stub out some code
    that will need to handle them and to make sure we won't forget to do so
    later.

    Release note: None

    storage: add atomic replication changes cluster setting

    This defaults to false, and won't have an effect unless the newly
    introduced cluster version is also active.

    Release note: None

    roachpb: support zero-change ChangeReplicasTrigger

    We will use a ChangeReplicasTrigger without additions and removals when
    transitioning out of a joint configuration, so make sure it supports this
    properly.

    Release note: None

    roachpb: return "desired" voters from ReplicaDescriptors.Voters

    Previous commits introduced (yet unused) voter types to encode joint
    consensus configurations which occur during atomic replication changes.

    Access to the slice of replicas is unfortunately common, though at least
    it's compartmentalized via the getters Voters() and Learners().

    The main problem solved in this commit is figuring out what should be
    returned from Voters(): is it all VoterX types, or only voters in one of
    the two majority configs part of a joint quorum?

    The useful answer is returning the set of voters corresponding to what the
    config will be once the joint state is exited; this happens to be what most
    callers care about. Incoming and full voters are really the same thing in
    our code; we just need to distinguish them from outgoing voters to
    correctly maintain the quorum sizes.

    Of course there are some callers that do care about quorum sizes, and a
    number of cleanups were made for them.

    This commit also adds a ReplicaDescriptors.ConfState helper which is then
    used in all of the places that were previously cobbling together a
    ConfState manually.

    Release note: None

    roachpb: add ReplicaType_Voter{Incoming,Outgoing}

    These are required for atomic replication changes to describe joint
    configurations, i.e. configurations consisting of two sets of replica which
    both need to reach quorum to make replication decisions.

    An audit of existing consumers of this enum will follow.

    Release note: None

    roachpb: rename ReplicaType variants

    The current naming is idiomatic for proto enums, but atypical for its usage
    in Go code. There is no `(gogoproto.customname)` that can fix this, and
    we're about to add more replica types that would require awkward names such
    as `roachpb.ReplicaType_VOTER_OUTGOING`.

    Switch to a Go-friendly naming scheme instead.

    Release note: None

    batcheval: generalize checkNotLearnerReplica

    This now errors out whenever the replica is not a voter, which is more
    robust as new replica types are introduced (which generally should not
    automatically become eligible to receive leases).

    Release note: None

    roachpb: improve RangeDescriptor.Validate

    Make sure there isn't more than one replica per store.

    Release note: None

    roachpb: generalize ReplicaDescriptor.String()

    The new code will generalize to new replica types.

    Release note: None

    [dnm] vendor: bump raft

    This picks up upstream fixes related to atomic membership changes.

    I had to smuggle in a small hack because we're picking up
    etcd-io/etcd#11037 which makes a race between the
    snapshot queue and the proactive learner snapshot much more likely, and
    this in turn makes tests quite flaky because it turns out that if the
    learner snap loses, it can actually error out.

    Release note: None

    storage: avoid fatal error from splitPostApply

    This is the next band-aid on top of #39658 and #39571. The descriptor
    lookup I added sometimes fails because replicas can process a split trigger
    in which they're not a member of the range:

    > F190821 15:14:28.241623 312191 storage/store.go:2172
    > [n2,s2,r21/3:/{Table/54-Max}] replica descriptor of local store not
    > found in right hand side of split

    I saw this randomly in `make test PKG=./pkg/ccl/partitionccl`.

    Release note: None

40221: cli: Add default locality settings for multi node demo clusters r=jordanlewis a=rohany

Addresses part of #39938.

Release note (cli change): Default cluster locality topologies for
multi-node cockroach demo clusters.

Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu>
@craig
Copy link
Contributor

craig bot commented Aug 26, 2019

Build succeeded

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten and @tbg)


pkg/storage/replica_metrics.go, line 172 at r8 (raw file):

	if rangeCounter {
		unavailable = !desc.Replicas().CanMakeProgress(func(rDesc roachpb.ReplicaDescriptor) bool {
			_, live := livenessMap[rDesc.NodeID]

Is this right here? The maps has nodes in it that are not live.

@tbg
Copy link
Member Author

tbg commented Feb 20, 2020 via email

@tbg tbg deleted the atomic/changes-port branch February 20, 2020 15:54
tbg added a commit to tbg/cockroach that referenced this pull request Mar 2, 2020
Found by @andreimatei in cockroachdb#39936 (review)

Release note (bug fix): a bug in the range metrics collection could fail
to correctly identify a range that had lost quorum, causing it to not be
reported via the "unavailable ranges" metric. This is now fixed.
tbg added a commit to tbg/cockroach that referenced this pull request Mar 2, 2020
Found by @andreimatei in cockroachdb#39936 (review)

Release note (bug fix): a bug in the range metrics collection could fail
to correctly identify a range that had lost quorum, causing it to not be
reported via the "unavailable ranges" metric. This is now fixed.
craig bot pushed a commit that referenced this pull request Mar 2, 2020
45253: storage: fix bug in calcRangeCounter r=andreimatei a=tbg

Found by @andreimatei in #39936 (review)

Release note (bug fix): a bug in the range metrics collection could fail
to correctly identify a range that had lost quorum, causing it to not be
reported via the "unavailable ranges" metric. This is now fixed.

45534: coldata: change Batch interface to operate with int length r=yuzefovich a=yuzefovich

This commit changes Batch interface to operate with `int` length
because it unifies the way we store batches - now we will have a single
implementation `MemBatch` (with an exception for `coldata.ZeroBatch`). This
allows us to remove `bufferedBatch`.

Release note: None

45554: kv: add a comment r=andreimatei a=andreimatei

Explain the surprising fact that we're telling then inflight-writes
btree to not reuse its nodes.

Release note: None

45580: storage: improve "range unavailable" log message r=nvanbenschoten a=tbg

I've had to look at this message on users clusters recently and I found
that it could be more actionable:

- adjusted the wording to steer enterprise users towards enterprise
  support
- print the range descriptor
- print which replicas we think are live
- log this message as an error instead of a warning (this message
  is always a serious problem that demands immediate attention).

Release note (general change): Improved a debug message that is printed
when a range is unavailable (i.e. unable to accept writes).

45585: roachtest: fix copy/bank roachtest to deal with smaller ranges r=ajwerner a=ajwerner

In #45451 we made the expected number of ranges reflect the current
zone config. The problem with this change is that it used a factor of
2x the minimum range size to estimate the upper bound on number of ranges.

There was no principled reason to make this assumption. We should instead
just assert that the ranges are larger than the minimum range size on average.

Release note: None

Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants