Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: don't always refuse to take leases when draining #55624

Merged

Conversation

andreimatei
Copy link
Contributor

@andreimatei andreimatei commented Oct 15, 2020

Before this patch, a draining node would not take any new leases once
it's draining. This is a problem in case all replicas of a range are
draining at the same time (e.g. when draining a single-node cluster, or
when draining multiple nodes at the same time perhaps by mistake) -
nobody wants the lease. Particularly because the liveness range is
expiration-based (and thus permanently in need of new leases to be
granted), this quickly results in nodes failing their liveness.
It also becomes more of a problem with #55148, where we start refusing
to take the lease on replicas that are not the leader - so if the leader
is draining, we deadlock.

This patch makes an exception for leaders, which now no longer refuse
the lease even when they're draining. The reasonsing being that it's too
easy to deadlock otherwise.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@andreimatei
Copy link
Contributor Author

Do y'all like this?

@knz
Copy link
Contributor

knz commented Oct 16, 2020

Regarding the main contribution of this patch:

This patch makes an exception for leaders, which now no longer refuse
the lease even when they're draining.

I am ok with this change, as long as we ensure that the drain process in charge of moving leases+leadership away continues to be successful doing so (which I believe is the case here)

However the rest of the motivation in the commit message is suspicious:

because the liveness range is expiration-based (and thus permanently in need of new leases to be granted), this quickly results in nodes failing their liveness.

I don't understand what you write here, what is the "failure" you talk of?

We have a liveness-related bug which I address in #55460: the drain process is iterative but mistakenly fails to re-wake up the stores upon every lookup of the liveness record, which can cause a deadlock. I introduced this bug but it's easy enough to fix without changing the lease transfer policy.

We have code that tries to transfer the leadership away from draining nodes [...] but that code is pretty best-effort

How so is it best effort? If it currently fails to do so, that is a bug that needs to be corrected.

Draining multiple nodes at once might be questionable, but I think it's an acceptible practice.

I somewhat agree with the fact it's acceptable but I would still not recommend it.

My expectation as an operator is that any drain is going to preserve cluster availability. So my expectation is also that a drain does not complete as long as there's no quorum on the range.

So my expectation is that if my replication factor is N, then when I start draining floor(N/2)+1 nodes concurrently then some or all the drains will never complete. This is by design.

We could argue that we need a separate shutdown API to bring down an entire cluster gracefully, with the understanding that it breaks availability. This API does not exist currently and the drain API is not meant to support it (nor would I want it to)

@andreimatei
Copy link
Contributor Author

because the liveness range is expiration-based (and thus permanently in need of new leases to be granted), this quickly results in nodes failing their liveness.

I don't understand what you write here, what is the "failure" you talk of?

What I'm trying to say is that, if all replicas of the liveness range are draining, then all nodes in the cluster become unable to heartbeat their records. This is just a special case of the general problem - which is that we can get into these situations where, although there's quorum, ranges get stuck until one of the draining node restarts.

We have a liveness-related bug which I address in #55460: the drain process is iterative but mistakenly fails to re-wake up the stores upon every lookup of the liveness record, which can cause a deadlock. I introduced this bug but it's easy enough to fix without changing the lease transfer policy.

You're referring to the discussion here. As I'm saying over there, it seems to me that what that patch is trying to do there is just papering over the problem, where the better thing to do is this patch. That patch improves the situation for the liveness range, but doesn't really do anything for other ranges because the period where nodes temporarily stop draining is very short.

We have code that tries to transfer the leadership away from draining nodes [...] but that code is pretty best-effort

How so is it best effort? If it currently fails to do so, that is a bug that needs to be corrected.

Well I meant that it's best effort because it governed by various timeouts, but actually the reality is that the draining code actually doesn't deal with leadership explicitly at all (although it certainly pretends to) - see #55619.

So my expectation is that if my replication factor is N, then when I start draining floor(N/2)+1 nodes concurrently then some or all the drains will never complete. This is by design.

Ideally, all the drains would not complete at once, but some should complete. And as the first completes, the node respective node would get restarted, and then a second drain can complete, and so on. So, ideally, I should be able to start draining all the nodes at once, and then restart them one by one as individual drains complete, and eventually all the nodes should get restarted. There's some other pieces missing for this story, but we should strive towards it. This patch helps as we've discussed above - it lets the basic functions of a cluster work even when all nodes (or more specifically, all replicas of a range) are draining at the same time.

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz, @nvanbenschoten, and @tbg)

@knz
Copy link
Contributor

knz commented Oct 19, 2020

because the liveness range is expiration-based (and thus permanently in need of new leases to be granted), this quickly results in nodes failing their liveness.

I don't understand what you write here, what is the "failure" you talk of?

What I'm trying to say is that, if all replicas of the liveness range are draining, then all nodes in the cluster become unable to heartbeat their records. This is just a special case of the general problem - which is that we can get into these situations where, although there's quorum, ranges get stuck until one of the draining node restarts.

My personal preference would be for draining replicas to continue heartbeating. I don't see why it's useful that they stop.

You're referring to the discussion here. As I'm saying over there, it seems to me that what that patch is trying to do there is just papering over the problem, where the better thing to do is this patch. That patch improves the situation for the liveness range, but doesn't really do anything for other ranges because the period where nodes temporarily stop draining is very short.

Yes this is what I was referring to but I don't have any opinion on this that's better than yours (And what I also replied on that thread)

Well I meant that it's best effort because it governed by various timeouts, but actually the reality is that the draining code actually doesn't deal with leadership explicitly at all (although it certainly pretends to) - see #55619.

Ok yes I understand this now. But how is this PR helping with this?

So my expectation is that if my replication factor is N, then when I start draining floor(N/2)+1 nodes concurrently then some or all the drains will never complete. This is by design.

Ideally, all the drains would not complete at once, but some should complete. And as the first completes, the node respective node would get restarted, and then a second drain can complete, and so on. So, ideally, I should be able to start draining all the nodes at once, and then restart them one by one as individual drains complete, and eventually all the nodes should get restarted. There's some other pieces missing for this story, but we should strive towards it.

That "ideally" does not work for me. We'd get the behavior you describe here if we were to do things the way you envision. But that's a tail wagging its dog! What argument do you have that this ideal behavior is also what our operators want? It looks, from a usability perspective, to be super hard to teach and understand, and it has extremely obscure failure scenarios.

This patch helps as we've discussed above - it lets the basic functions of a cluster work even when all nodes (or more specifically, all replicas of a range) are draining at the same time.

I can see how this PR moves you towards the "ideal" world you would like to see. But I do not buy into the idea that's desirable for CockroachDB and its users (yet).

I can see why we want to merge this PR (and I think I'm OK with the change, as I wrote first thing above) but I'd like a more compelling explanation in the commit message.

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My personal preference would be for draining replicas to continue heartbeating. I don't see why it's useful that they stop.

I think there's some misunderstanding. Who said anything about the draining nodes stopping their heartbeats? Quite the contrary - the point of this PR is to let them succeed in heartbeating (by letting the liveness range get a lease).

Well I meant that it's best effort because it governed by various timeouts, but actually the reality is that the draining code actually doesn't deal with leadership explicitly at all (although it certainly pretends to) - see #55619.

Ok yes I understand this now. But how is this PR helping with this?

What is "this"? If "this" is the fact that the draining code doesn't deal with leadership for lease-less ranges, then this patch helps because #55148 doesn't deadlock any more (with this patch, it becomes OK to refuse to always refuse to take a lease if you're not the leader).
It "this" is #55460, then the conjecture is that, with this PR, you no longer need the s.node.SetDraining(false /* drain */) in #55460. If I understand correctly, the reason why you had to add that call was because, without it, the s.nodeLiveness.SetDraining(ctx, true /* drain */) because the liveness range can't get a lease any more in various scenarios. With this patch, it'll get the needed lease.

Ideally, all the drains would not complete at once, but some should complete. And as the first completes, the node respective node would get restarted, and then a second drain can complete, and so on. So, ideally, I should be able to start draining all the nodes at once, and then restart them one by one as individual drains complete, and eventually all the nodes should get restarted. There's some other pieces missing for this story, but we should strive towards it.

That "ideally" does not work for me. We'd get the behavior you describe here if we were to do things the way you envision. But that's a tail wagging its dog! What argument do you have that this ideal behavior is also what our operators want? It looks, from a usability perspective, to be super hard to teach and understand, and it has extremely obscure failure scenarios.

Perhaps I haven't made my point very well; I don't want any tails wagging any dogs. I was trying to argue that this patch is a good idea even separately from #55148. Perhaps I should have talked about only why #55148 needs this, and not talk about anything else.
Anyway, if we're in agreement, I'll think about how to change the commit message. When it comes to draining multiple nodes at once, I'm not particularly interested in telling people to start doing that. What I am interested in is CRDB having reasonable behavior is as many scenarios as possible. If an operators drains multiple nodes at once (perhaps by mistake), they'd surely prefer the cluster to work then the cluster not to work. Without this PR, the cluster most definitely does not work. With this PR, at least the KV layer works (the SQL layer might not, I'm not sure). I see this as a pretty clear win.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz, @nvanbenschoten, and @tbg)

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @knz and @tbg)

@tbg tbg self-requested a review October 20, 2020 10:47
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mildly conflicted about this change. It's desirable to guard against the complete failure of KV should (say) all nodes of the liveness range start draining immediately.
What's the behavior with this patch when we drain, say, all nodes? Leases transfers will still respect draining, so essentially no more leases are transfered. Raft leadership by and large will thus stay where it is, and the drains grind to a halt. That seems like an improvement.

The cost is that during a regular drain, it's more difficult to ascertain that all leases are actually gone. In effect, we need to transfer leases away and also wait for all raft leaderships to have moved away. And even then, it's possible that a raft leadership comes back last minute, a lease pops up where one wasn't held before, and we kill the process with a lease on the node, resulting in an up to 10s outage on the affected ranges. Regular draining is the much more common operation and we have to have a way to make it bulletproof.
We have looked at leases mostly so far. What if we prevented draining nodes from campaigning (or gave them a higher threshold for doing so, say 2x the configured one)? That seems like it would do the trick. Though maybe it's not necessary. Under normal operations (one single node getting drained at all times), do we ever expect random raft elections (or inbound leadership transfers)? I can only imagine that those could arise from ranges that become active again after a period of inactivity, but those should be quiesced and thus preserve leadership.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @knz and @tbg)

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under normal operations (one single node getting drained at all times), do we ever expect random raft elections (or inbound leadership transfers)?

You mean incoming lease transfers coming to a draining node? I'm not sure... There's lease rebalancing based on load, manual SCATTERs, and I think there's got to be some rebalancing caused by nodes being added to the cluster. I'm not sure which ones look at draining and which don't; maybe they all do. Of course, information on who's draining comes from gossip, so it's not the most reliable.

And even then, it's possible that a raft leadership comes back last minute, a lease pops up where one wasn't held before, and we kill the process with a lease on the node, resulting in an up to 10s outage on the affected ranges.

What I think we should do about this is to split the lease draining process in two phases. Phase 1 is unbounded in duration; during it we continuously try to shed leases and leadership, and prefer to not take leases - but still take them if we have the leadership. The 2nd phase would run when the node is about to terminate - so it will have a bounded duration of a second or so. During this phase, we'd indeed refuse to take a lease (because we're quitting anyway), and do another pass at transferring leases and leadership away from replicas that have it.

What if we prevented draining nodes from campaigning (or gave them a higher threshold for doing so, say 2x the configured one)?

Right, I think this is the right line of though here. We should discourage draining nodes from campaigning for new leadership, and we should be transferring the existing leadership away.


Just to make sure this hasn't gotten lost in the chatter, lets remember how we got here. The code as it stands is incompatible with #55148 (kvserver: don't allow raft forwarding of lease requests) - that PR makes everybody but the leader refuse to take the lease. The draining code, as is, refuses to take the lease on a draining node. This means that if the leader is draining, nobody takes the lease. So, we need to do something.
Doing what this patch does I think is very attractive because, in conjunction with #55148, means that the lease acquisition code will not need to look at either the draining status or the leadership status above Raft: my plan is to merge this, and then change the requestLeaseLocked() code to simply remove the

	if r.store.IsDraining() && (r.raftBasicStatusRLocked().RaftState != raft.StateLeader) {

block because it won't be necessary any more - either the replica is the leader, in which case we want the lease regardless of draining status, or isn't the leader in which case we don't want it regardless of draining status. I'm gonna fix the failing tests here and merge this, and then we can resume talking about how to improve leadership management while draining.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @knz and @tbg)

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

is a test failing because of your change? That's good, because I was about to ask you to write a test.

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @knz and @tbg)

@tbg tbg self-requested a review October 21, 2020 11:54
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I broadly agree with your suggested future steps, though I hope we can make do without an explicit second stage. Incoming leases should really not be expected any more by waiting out the draining liveness record (or at least I think making all subsystems available of the draining status is the first choice of strategy rather than jumping two an extra stage). So we get something like this:

  • mark as draining
  • wait until cluster must be aware of draining (implies no new incoming leases, no incoming raft leadership transfers)
  • lower raft campaigning aggressiveness (tbd what exactly makes sense here, if anything) (implies no more randomly acquired raft leaderships any more, except where they're truly necessary)
  • for { transfer all leases away; if noMoreLeases && noMoreRaftLeaders { return } else { sleep(time.Second) }
  • final wait to improve distsender cache updates in rest of cluster

This is pretty close to what the draining code looks like at and before #55460.

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @knz and @tbg)

@tbg tbg self-requested a review October 21, 2020 12:50
Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been working on a test here but it's taking me a while. Coming soon.

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @knz and @tbg)

@andreimatei andreimatei force-pushed the drain.leader-always-accepts-lease branch from d04d741 to b59c5d9 Compare October 22, 2020 23:43
@andreimatei andreimatei requested a review from a team as a code owner October 22, 2020 23:43
@andreimatei andreimatei force-pushed the drain.leader-always-accepts-lease branch from b59c5d9 to 1631b00 Compare October 23, 2020 01:46
Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test, PTAL

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @bdarnell, @knz, @nvanbenschoten, and @tbg)

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: I like the test, although I think you can simplify it since we already have a testing thing to pause heartbeats.

I think you can also update the PR description at top using your commit messages, which are nicely explanatory.

Reviewed 4 of 4 files at r2, 3 of 3 files at r3, 3 of 3 files at r4.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @andreimatei and @tbg)


pkg/kv/kvserver/replica_test.go, line 1452 at r4 (raw file):

	// rejectLivenessHeartbeats is a gRPC filter which returns errors on all
	// attempts to heartbeat n1's liveness when suspendN1Heartbeats is set.

There is already a feature to do this: (*NodeLiveness) PauseHeartbeatLoopForTest().

@tbg
Copy link
Member

tbg commented Oct 23, 2020

I won't be able to look until Tuesday unfortunately. Can someone else sign off on the test? The other stuff was LGTM.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: the only concerns I have past what knz said are aesthetic.

Reviewed 4 of 4 files at r2, 3 of 3 files at r3, 3 of 3 files at r4.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @knz, and @tbg)


pkg/kv/kvclient/kvcoord/dist_sender.go, line 263 at r2 (raw file):

	clusterID *base.ClusterIDContainer

	// disableFirstRangeUpdates disables updates of the first range via

Aren't all of these fields testing knobs as well? We should either pull everything into the testingKnobs struct or not introduce it.


pkg/kv/kvclient/kvcoord/testing_knobs.go, line 20 at r2 (raw file):

)

// RequestFilterType is type of functions used as RequestFilter.

"the type"?


pkg/kv/kvclient/kvcoord/testing_knobs.go, line 29 at r2 (raw file):

	// any request. If either of the retvals is non-nil, then that's used as the
	// RPC's result and the call is not performed any more.
	RequestFilter RequestFilterType

nit: keep the formatting consistent here, otherwise the code quality deteriorates. In this case, we have a new line between each field.


pkg/kv/kvserver/node_liveness.go, line 262 at r3 (raw file):

}

// NodeLivenessTestingKnobs allows tests to override some node liveness

Shouldn't this live in pkg/kv/kverserver/testing_knobs.go alongside StoreTestingKnobs?


pkg/kv/kvserver/replica_range_lease.go, line 844 at r4 (raw file):

}

func (r *Replica) currentLeaseStatus(ctx context.Context) kvserverpb.LeaseStatus {

It seems like this should live right below leaseStatus and contain a comment referencing the relationship between this method and that one.


pkg/kv/kvserver/replica_test.go, line 1433 at r4 (raw file):

// Test that draining nodes only take the lease if they're the leader.
func TestReplicaDrainLease(t *testing.T) {

You stressed this test for a while, right?


pkg/kv/kvserver/replica_test.go, line 1452 at r4 (raw file):

Previously, knz (kena) wrote…

There is already a feature to do this: (*NodeLiveness) PauseHeartbeatLoopForTest().

+1

@lunevalex
Copy link
Collaborator


pkg/kv/kvserver/replica_test.go, line 1493 at r4 (raw file):

	defer tc.Stopper().Stop(ctx)
	rngKey := tc.ScratchRange(t)
	_, err := tc.AddReplicas(rngKey, tc.Target(1))

nit: you can use tc.AddReplicasOrFatal()

Copy link
Collaborator

@lunevalex lunevalex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @knz, and @tbg)


pkg/kv/kvserver/replica_test.go, line 1498 at r4 (raw file):

	s1 := tc.Server(0)
	s2 := tc.Server(1)
	store1, err := s1.GetStores().(*Stores).GetStore(s1.GetFirstStoreID())

nit: there is a convenience method tc.GetFirstStoreFromServer(t, 0)


pkg/kv/kvserver/replica_test.go, line 1503 at r4 (raw file):

	require.NoError(t, err)

	rd, err := tc.LookupRange(rngKey)

nit: there is a LookupRangeOrFatal as well

@andreimatei andreimatei force-pushed the drain.leader-always-accepts-lease branch from 1631b00 to cde0e66 Compare October 26, 2020 19:42
This test either has completely rotted, or has always been confused. It
tries to verify that different mechanisms trigger raft reproposals, but
it doesn't seem to actually depend on reproposals. The test says that an
increment depend on the reproposal of a previous lease request, but
there's no such lease request. Also, technically reproposals of lease
requests stopped being a thing a while ago. It also talks about Raft
leadership changing, but that's not explicit in the test.

The test fails with the next commit that changes how draining replicas
handle lease requests. It's unclear to me what's salvageable from the
test.

Release note: None
Before this patch, a draining node would not take any new leases once
it's draining. This is a problem in case all replicas of a range are
draining at the same time (e.g. when draining a single-node cluster, or
when draining multiple nodes at the same time perhaps by mistake) -
nobody wants the lease. Particularly because the liveness range is
expiration-based (and thus permanently in need of new leases to be
granted), this quickly results in nodes failing their liveness.
It also becomes more of a problem with cockroachdb#55148, where we start refusing
to take the lease on replicas that are not the leader - so if the leader
is draining, we deadlock.

This patch makes an exception for leaders, which now no longer refuse
the lease even when they're draining. The reasonsing being that it's too
easy to deadlock otherwise.

Release note: None
@andreimatei andreimatei force-pushed the drain.leader-always-accepts-lease branch from cde0e66 to acc1ad1 Compare October 26, 2020 19:44
Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR
I've prepended a commit deleting TestRefreshPendingCommands - the test we were discussing on email. I don't think there's anything to salvage there.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 3 stale) (waiting on @knz, @lunevalex, @nvanbenschoten, and @tbg)


pkg/kv/kvclient/kvcoord/dist_sender.go, line 263 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Aren't all of these fields testing knobs as well? We should either pull everything into the testingKnobs struct or not introduce it.

removed the new struct


pkg/kv/kvclient/kvcoord/testing_knobs.go, line 20 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"the type"?

removed


pkg/kv/kvclient/kvcoord/testing_knobs.go, line 29 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: keep the formatting consistent here, otherwise the code quality deteriorates. In this case, we have a new line between each field.

removed


pkg/kv/kvserver/node_liveness.go, line 262 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Shouldn't this live in pkg/kv/kverserver/testing_knobs.go alongside StoreTestingKnobs?

done


pkg/kv/kvserver/replica_range_lease.go, line 844 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

It seems like this should live right below leaseStatus and contain a comment referencing the relationship between this method and that one.

done


pkg/kv/kvserver/replica_test.go, line 1433 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

You stressed this test for a while, right?

yes


pkg/kv/kvserver/replica_test.go, line 1452 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

+1

ok, switched to PauseAllHeartbeatsForTest.
I was aware of these methods but I don't like them very much - if I can, I'd rather not interfere with the nodes' operation. But if y'all like it, I don't care too much.


pkg/kv/kvserver/replica_test.go, line 1493 at r4 (raw file):

Previously, lunevalex wrote…

nit: you can use tc.AddReplicasOrFatal()

done


pkg/kv/kvserver/replica_test.go, line 1498 at r4 (raw file):

Previously, lunevalex wrote…

nit: there is a convenience method tc.GetFirstStoreFromServer(t, 0)

that's a method on TestCluster but here we're using TestClusterInterface. We can't just put the method on the interface with the *kvserver.Store return type because the serverutils package can't depend on almost anything. So we'd have to introduce a version with a generic return type, which I don't think is worth it.


pkg/kv/kvserver/replica_test.go, line 1503 at r4 (raw file):

Previously, lunevalex wrote…

nit: there is a LookupRangeOrFatal as well

done

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can also update the PR description at top using your commit messages, which are nicely explanatory.

done

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 3 stale) (waiting on @knz, @lunevalex, @nvanbenschoten, and @tbg)

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 3 stale) (waiting on @knz, @lunevalex, @nvanbenschoten, and @tbg)

@craig
Copy link
Contributor

craig bot commented Oct 27, 2020

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants