Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

localtestcluster: re-order setting of gossip descriptor #54224

Merged
merged 1 commit into from
Sep 11, 2020

Conversation

irfansharif
Copy link
Contributor

The heartbeat loop depends on gossip to retrieve the node ID. When
stressing a few tests that make use of LocalTestCluster, I was seeing
empty liveness records for empty node IDs being heartbeated. By
re-ordering things as such we bring it closer to the Server
initialization ordering.

Release note: None

The heartbeat loop depends on gossip to retrieve the node ID. When
stressing a few tests that make use of LocalTestCluster, I was seeing
empty liveness records for empty node IDs being heartbeated. By
re-ordering things as such we bring it closer to the Server
initialization ordering.

Release note: None
@irfansharif irfansharif requested review from knz and tbg September 10, 2020 22:16
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg)

@irfansharif
Copy link
Contributor Author

Thanks for the review!

bors r+

@craig
Copy link
Contributor

craig bot commented Sep 11, 2020

Build succeeded:

@craig craig bot merged commit 62a796d into cockroachdb:master Sep 11, 2020
craig bot pushed a commit that referenced this pull request Sep 11, 2020
54216: kvserver: address migration concern with node liveness r=irfansharif a=irfansharif

In #53842 we introduced a change to always persist a liveness record on
start up. As part of that change, we refactored how the liveness
heartbeat codepath dealt with missing liveness records: it knew to fetch
it from KV given we were now maintaining the invariant that it would
always be present. Except that wasn't necessarily true, as demonstrated
by the following scenario:

```
// - v20.1 node gets added to v20.1 cluster, and is quickly removed
//   before being able to persist its liveness record.
// - The cluster is upgraded to v20.2.
// - The node from earlier is rolled into v20.2, and re-added to the
//   cluster.
// - It's never able to successfully heartbeat (it didn't join
//   through the join rpc, bootstrap, or gossip). Welp.
```
Though admittedly unlikely, we should handle it all the same instead of
simply erroring out. We'll just fall back to creating the liveness
record in-place as we did in v20.1 code. We can remove this fallback in
21.1 code.

---

First commit is from #54224.

Release note: None

Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
@irfansharif irfansharif deleted the 200910.ltc-gossip-fix branch September 11, 2020 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants