Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: allocate NodeID via one-off RPC to initialized node #32574

Closed
tbg opened this issue Nov 23, 2018 · 6 comments
Closed

storage: allocate NodeID via one-off RPC to initialized node #32574

tbg opened this issue Nov 23, 2018 · 6 comments
Assignees
Labels
A-kv-server Relating to the KV-level RPC server C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Milestone

Comments

@tbg
Copy link
Member

tbg commented Nov 23, 2018

Our handling of the NodeID is currently quite involved (and leaks into many places) because we pretty much require a working node to be set up before it can use its own KV store to request a NodeID.

There are two levels of improvement here. The first is that once Gossip is connected, and the Node is uninitialized, we use a one-off RPC AllocateNodeID which simply asks a remote node of our choosing (taken from Gossip) to allocate a NodeID for us. This shifts the complexity of using the KV store to a node that has it in perfect working condition. (Similar improvements apply to StoreID allocation, which we may be able to hoist out of the depths of (*Node).Start where it is buried right now).

This removes uninitialized NodeIDs from everywhere except Gossip. Removing it from Gossip is more involved, depending on what guarantees we want to offer. The basic idea is to read the Gossip bootstrap list directly to send AllocateNodeID, and not starting Gossip before that has succeeded. However, consider the case in which a node isn't able to reach out to other nodes (perhaps its join flags are stale) but other nodes are able to reach out to this node (this is a poor setup, but it could happen): we'd get stuck, unless there's at least a minimal subset of Gossip running that would add the peer's address to the list of addresses we pick from for AllocateNodeID. This doesn't seem terrible to implement, but it's a little more involved.

Related to #30553.

@tbg tbg added the A-kv-server Relating to the KV-level RPC server label Nov 23, 2018
@tbg tbg added this to the 2.2 milestone Nov 23, 2018
@tbg tbg added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Nov 27, 2018
@tbg
Copy link
Member Author

tbg commented Aug 12, 2019

Join-time NodeID/ClusterID RPC

Switch to a dedicated mechanism for joining an existing cluster that does not require a running Gossip network and Node.

Problem

Nodes joining a cluster for the first time currently perform an awkward dance: to allocate a nodeID, they set up a more or less fully functional node so that they can run a KV Increment request on the key that houses the NodeID counter. When the NodeID is assigned, the already-running server needs to patch it into all of the components that want to know it. The result is this code

// NodeIDContainer is used to share a single roachpb.NodeID instance between
// multiple layers. It allows setting and getting the value. Once a value is
// set, the value cannot change.
type NodeIDContainer struct {
_ util.NoCopy
// nodeID is atomically updated under the mutex; it can be read atomically
// without the mutex.
nodeID int32
}

which complicates matters in all components in which the NodeID plays a role (that's many of them).

A similar problem occurs for the ClusterID, and there's a corresponding container:

// ClusterIDContainer is used to share a single Cluster ID instance between
// multiple layers. It allows setting and getting the value. Once a value is
// set, the value cannot change.
//
// The cluster ID is determined on startup as follows:
// - If there are existing stores, their cluster ID is used.
// - If the node is bootstrapping, a new UUID is generated.
// - Otherwise, it is determined via gossip with other nodes.
type ClusterIDContainer struct {
syncutil.Mutex
clusterID uuid.UUID
}

The ClusterID is primarily distributed via Gossip, and nodes asking to join a cluster must thus connect to Gossip, before they're really ready to do so.

Start-up sequence today

We shouldn't have to set up an incomplete server and deal with updating it in-place. Instead, a node needing to bootstrap via joining should, at a high level, ask an existing node for the ClusterID and a freshly assigned NodeID before really entering its start sequence.

The sequence to join a cluster today takes roughly the following form

First, the node must find that no NodeID/ClusterID has been persisted yet

cockroach/pkg/server/server.go

Lines 1308 to 1315 in 6cb71bf

} else {
// We have no existing stores. We start an initServer and then wait for
// one of the following:
//
// - gossip connects (i.e. we're joining an existing cluster, perhaps
// freshly bootstrapped but this node doesn't have to know)
// - we auto-bootstrap (if no join flags were given)
// - a client bootstraps a cluster via node.

and that the gossip resolver slice is not empty

cockroach/pkg/server/server.go

Lines 1347 to 1353 in 6cb71bf

if len(s.cfg.GossipBootstrapResolvers) == 0 {
// If the _unfiltered_ list of hosts from the --join flag is
// empty, then this node can bootstrap a new cluster. We disallow
// this if this node is being started with itself specified as a
// --join host, because that's too likely to be operator error.
if _, err := s.initServer.Bootstrap(ctx, &serverpb.BootstrapRequest{}); err != nil {
return errors.Wrap(err, "while bootstrapping")
(otherwise, it either already has NodeID/ClusterID or will bootstrap a new cluster).

Now we accept incoming requests on the init server (but in this case there aren't any, since we look at joining an existing cluster). In today's code, Gossip is already started at this point (even though it would like to know a NodeID, tough luck). Two things can happen: either we get an explicit request to bootstrap a new cluster, in which case that is done. Otherwise, Gossip manages to connect, this means that some other node was bootstrapped, and it tells us the cluster ID and we continue.

Next, the node is started. This is awkward: we're starting a node without a NodeID because that gives us a KV stack that we can then use to allocate a NodeID:

} else {
n.initialBoot = true
// Wait until Gossip is connected before trying to allocate a NodeID.
// This isn't strictly necessary but avoids trying to use the KV store
// before it can possibly work.
if err := n.connectGossip(ctx); err != nil {
return err
}
// If no NodeID has been assigned yet, allocate a new node ID.
ctxWithSpan, span := n.AnnotateCtxWithSpan(ctx, "alloc-node-id")
newID, err := allocateNodeID(ctxWithSpan, n.storeCfg.DB)
if err != nil {
return err
}
log.Infof(ctxWithSpan, "new node allocated ID %d", newID)
span.Finish()
nodeID = newID
}

This is the step that will need more work. We want this code to go away.

Start-up sequence (proposed)

If we are waiting for init/join, roughly here

https://github.com/cockroachdb/cockroach/blob/6cb71bf9c4fb64711d20c269d43586173056bbc9/pkg/server/server.go#L1388-L1387

we want to

  1. grab the --join flags
  2. hit a newly added JoinCluster (name TBD) endpoint for the first join target using
    a one-shot grpc connection (but using the right certs, etc)
  3. if there's no response (i.e. a timeout or "I'm also trying to join" error), try the next one. Wrap around if necessary.
  4. Once there is a response, it contains our NodeID. Great! Now we can persist that (note below that we also get a StoreID back for convenience) and actually boot. We can bootstrap the first store before (*Node).start, so the bootstrapped stores slice here is nonempty:
    bootstrappedEngines, emptyEngines, cv, err := inspectEngines(
    and everything falls into place.

The message definitions for this could look like this:

message JoinNodeRequest {
  // NB: a node that joins has the lowest version active that it supports,
  // i.e. its MinSupportedVersion. We could just name it that, too?
  // Recipient will refuse this request if it's not compatible with active_version.
  roachpb.Version active_version = 1;
}
message JoinNodeResponse {
  int32 node_id = 1 [(gogoproto.customname) = "NodeID"];
  // A first StoreID allocated for the new node. The Node can in principle allocate
  // it own StoreIDs once it has a NodeID, but it doesn't have a good way to persist
  // a NodeID until it also has an initialized Store. Handing out a one-off StoreID here
  // is a trivial way to avoid additional complexity related to that.
  int32 store_id = 2 [(gogoproto.customname) = "StoreID"];
  bytes cluster_id = 3 [(gogoproto.customname) = "ClusterID"];
}

We still haven't removed the early Gossip server. I believe we can start it only after the ClusterID/NodeID are known from a successful JoinNode request simply because it isn't required for bootstrapping any more. And we should be able to bypass *rpc.Context (and its intrinsic need to want to know cluster version, NodeID, clusterID) etc by using one-off grpc connections during bootstrap.

Potential issue: (more) loss of bidirectionality

Consider the following setup, where an edge from i to j means that node i has node j in its join flags.

2
↕
1 ← 3
↑
4

Assume all nodes are initially uninitialized (i.e. waiting for bootstrap). The cluster will only come together successfully if the node that receives the BootstrapRequest is reachable (transitively) from all other nodes.

For example:

  • bootstrap 1: works, all nodes will reach out to 1
  • bootstrap 2: works, 1 will reach out to two and 3,4 can reach out to 1 after that
  • bootstrap 3/4: nope, the other nodes will never reach out to them

In today's code, bootstrapping 3 would actually "push" the clusterID to 1 via Gossip and then 1 would initialize, and then everyone else. This is enough of a limitation to do something about, for example add a ShouldJoinRequest which, when received, simply triggers a JoinRequest back to the sender (thus replacing the push mechanism).

tbg added a commit to tbg/cockroach that referenced this issue Apr 1, 2020
\## Motivation

Today, starting up a server is complicated. This is especially true
when bootstrap is necessary. By this, we mean that either

- no NodeID is known. This implies that none of the engines were
  initialized yet
- the NodeID is known (i.e. at least one store is initialized), but a
  new engine was added.

When the process first starts, and a NodeID is known, a ClusterID is
also known (since they get persisted together, on every store). For the
same reason, a persisted cluster version is known. In this case, we can
start a fully initialized cluster, so this is the easy case.

It is more difficult when no NodeID is known, in which case the server
is just starting up for the first time, with its engines all blank. It
needs to somehow allocate NodeIDs (and StoreIDs) which are then written
to the engines. It also needs to, at least initially, use the lowest
possible cluster version it can tolerate (after all, the cluster may
actually run at that lowest version). Right now, there is a delicate
dance: we thread late-bound ClusterID and NodeID containers all over the
place, spin up a mostly dysfunctional Node, use its KV client to
allocate NodeID and StoreIDs once this is possible - we need Gossip
to have connected first - and update the containers. It is complex,
error prone, and ossifies any code it touches.

Cluster versions deserve an extra shout-out for complexity. Even if
a cluster version was persisted, the node may have been decommissioned
and the cluster upgraded. Much work went into our RPC layer to prevent
connections between incompatible nodes, but there is no boot-time check
that results in a swift and descriptive error - there will be a fatal
error, originating from the RPC subsystem. One aim of our work is to
simplify this by checking the version via an RPC before setting too
many gears in motion.

\## Context

This marks the beginning of a series of PRs aiming at improving the
startup code. Ultimately, the goal is to have the Node and all Stores
bootstrapped as early as possible, but in particular before starting
the KV or SQL server subsystems.

Furthermore, we want to achieve this without relying on Gossip, to
prepare for a split between the SQL and KV servers in the context of
multitenancy support (SQL server won't be able to rely on Gossip, but
will still need to announce itself to the KV servers).

\## This PR

This PR is an initial simplifying refactor that can help achieve these
goals. The init server (which hosts the Bootstrap RPC) is given more
responsibilities: it is now directly in charge of determining which, if
any, engines are bootstrapped, and explicitly listens to Gossip as well
as the Bootstrap RPC. It returns the cluster ID to the main server
start-up goroutine when it is available. As a result, the main startup
code has simplified, and a thread to be pulled on further has appeared
and is called out in TODOs.

Down the road (i.e. in later PRs), the init server will bootstrap a
NodeID and StoreIDs even when joining an existing cluster. It will
initially mimic/front-load the strategy taken by Node today, i.e. use a
`kv.DB`, but ultimately will bypass Gossip completely and use a simple
RPC call to ask the existing cluster to assign these IDs as needed. This
RPC will also establish the active cluster version, which is required
for SQL multi-tenancy, and generally follows the ideas in cockroachdb#32574.

Release note: None
tbg added a commit to tbg/cockroach that referenced this issue Apr 4, 2020
\## Motivation

Today, starting up a server is complicated. This is especially true
when bootstrap is necessary. By this, we mean that either

- no NodeID is known. This implies that none of the engines were
  initialized yet
- the NodeID is known (i.e. at least one store is initialized), but a
  new engine was added.

When the process first starts, and a NodeID is known, a ClusterID is
also known (since they get persisted together, on every store). For the
same reason, a persisted cluster version is known. In this case, we can
start a fully initialized cluster, so this is the easy case.

It is more difficult when no NodeID is known, in which case the server
is just starting up for the first time, with its engines all blank. It
needs to somehow allocate NodeIDs (and StoreIDs) which are then written
to the engines. It also needs to, at least initially, use the lowest
possible cluster version it can tolerate (after all, the cluster may
actually run at that lowest version). Right now, there is a delicate
dance: we thread late-bound ClusterID and NodeID containers all over the
place, spin up a mostly dysfunctional Node, use its KV client to
allocate NodeID and StoreIDs once this is possible - we need Gossip
to have connected first - and update the containers. It is complex,
error prone, and ossifies any code it touches.

Cluster versions deserve an extra shout-out for complexity. Even if
a cluster version was persisted, the node may have been decommissioned
and the cluster upgraded. Much work went into our RPC layer to prevent
connections between incompatible nodes, but there is no boot-time check
that results in a swift and descriptive error - there will be a fatal
error, originating from the RPC subsystem. One aim of our work is to
simplify this by checking the version via an RPC before setting too
many gears in motion.

\## Context

This marks the beginning of a series of PRs aiming at improving the
startup code. Ultimately, the goal is to have the Node and all Stores
bootstrapped as early as possible, but in particular before starting
the KV or SQL server subsystems.

Furthermore, we want to achieve this without relying on Gossip, to
prepare for a split between the SQL and KV servers in the context of
multitenancy support (SQL server won't be able to rely on Gossip, but
will still need to announce itself to the KV servers).

\## This PR

This PR is an initial simplifying refactor that can help achieve these
goals. The init server (which hosts the Bootstrap RPC) is given more
responsibilities: it is now directly in charge of determining which, if
any, engines are bootstrapped, and explicitly listens to Gossip as well
as the Bootstrap RPC. It returns the cluster ID to the main server
start-up goroutine when it is available. As a result, the main startup
code has simplified, and a thread to be pulled on further has appeared
and is called out in TODOs.

Down the road (i.e. in later PRs), the init server will bootstrap a
NodeID and StoreIDs even when joining an existing cluster. It will
initially mimic/front-load the strategy taken by Node today, i.e. use a
`kv.DB`, but ultimately will bypass Gossip completely and use a simple
RPC call to ask the existing cluster to assign these IDs as needed. This
RPC will also establish the active cluster version, which is required
for SQL multi-tenancy, and generally follows the ideas in cockroachdb#32574.

Release note: None
craig bot pushed a commit that referenced this issue Apr 7, 2020
46843: server: simplify bootstrap via fatter init server r=andreimatei a=tbg

## Motivation

Today, starting up a server is complicated. This is especially true
when bootstrap is necessary. By this, we mean that either

- no NodeID is known. This implies that none of the engines were
  initialized yet
- the NodeID is known (i.e. at least one store is initialized), but a
  new engine was added.

When the process first starts, and a NodeID is known, a ClusterID is
also known (since they get persisted together, on every store). For the
same reason, a persisted cluster version is known. In this case, we can
start a fully initialized cluster, so this is the easy case.

It is more difficult when no NodeID is known, in which case the server
is just starting up for the first time, with its engines all blank. It
needs to somehow allocate NodeIDs (and StoreIDs) which are then written
to the engines. It also needs to, at least initially, use the lowest
possible cluster version it can tolerate (after all, the cluster may
actually run at that lowest version). Right now, there is a delicate
dance: we thread late-bound ClusterID and NodeID containers all over the
place, spin up a mostly dysfunctional Node, use its KV client to
allocate NodeID and StoreIDs once this is possible - we need Gossip
to have connected first - and update the containers. It is complex,
error prone, and ossifies any code it touches.

Cluster versions deserve an extra shout-out for complexity. Even if
a cluster version was persisted, the node may have been decommissioned
and the cluster upgraded. Much work went into our RPC layer to prevent
connections between incompatible nodes, but there is no boot-time check
that results in a swift and descriptive error - there will be a fatal
error, originating from the RPC subsystem. One aim of our work is to
simplify this by checking the version via an RPC before setting too
many gears in motion.

## Context

This marks the beginning of a series of PRs aiming at improving the
startup code. Ultimately, the goal is to have the Node and all Stores
bootstrapped as early as possible, but in particular before starting
the KV or SQL server subsystems.

Furthermore, we want to achieve this without relying on Gossip, to
prepare for a split between the SQL and KV servers in the context of
multitenancy support (SQL server won't be able to rely on Gossip, but
will still need to announce itself to the KV servers).

## This PR

This PR is an initial simplifying refactor that can help achieve these
goals. The init server (which hosts the Bootstrap RPC) is given more
responsibilities: it is now directly in charge of determining which, if
any, engines are bootstrapped, and explicitly listens to Gossip as well
as the Bootstrap RPC. It returns the cluster ID to the main server
start-up goroutine when it is available. As a result, the main startup
code has simplified, and a thread to be pulled on further has appeared
and is called out in TODOs.

Down the road (i.e. in later PRs), the init server will bootstrap a
NodeID and StoreIDs even when joining an existing cluster. It will
initially mimic/front-load the strategy taken by Node today, i.e. use a
`kv.DB`, but ultimately will bypass Gossip completely and use a simple
RPC call to ask the existing cluster to assign these IDs as needed. This
RPC will also establish the active cluster version, which is required
for SQL multi-tenancy, and generally follows the ideas in #32574.

Release note: None

Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
tbg added a commit to tbg/cockroach that referenced this issue Apr 7, 2020
\## Motivation

Today, starting up a server is complicated. This is especially true
when bootstrap is necessary. By this, we mean that either

- no NodeID is known. This implies that none of the engines were
  initialized yet
- the NodeID is known (i.e. at least one store is initialized), but a
  new engine was added.

When the process first starts, and a NodeID is known, a ClusterID is
also known (since they get persisted together, on every store). For the
same reason, a persisted cluster version is known. In this case, we can
start a fully initialized cluster, so this is the easy case.

It is more difficult when no NodeID is known, in which case the server
is just starting up for the first time, with its engines all blank. It
needs to somehow allocate NodeIDs (and StoreIDs) which are then written
to the engines. It also needs to, at least initially, use the lowest
possible cluster version it can tolerate (after all, the cluster may
actually run at that lowest version). Right now, there is a delicate
dance: we thread late-bound ClusterID and NodeID containers all over the
place, spin up a mostly dysfunctional Node, use its KV client to
allocate NodeID and StoreIDs once this is possible - we need Gossip
to have connected first - and update the containers. It is complex,
error prone, and ossifies any code it touches.

Cluster versions deserve an extra shout-out for complexity. Even if
a cluster version was persisted, the node may have been decommissioned
and the cluster upgraded. Much work went into our RPC layer to prevent
connections between incompatible nodes, but there is no boot-time check
that results in a swift and descriptive error - there will be a fatal
error, originating from the RPC subsystem. One aim of our work is to
simplify this by checking the version via an RPC before setting too
many gears in motion.

\## Context

This marks the beginning of a series of PRs aiming at improving the
startup code. Ultimately, the goal is to have the Node and all Stores
bootstrapped as early as possible, but in particular before starting
the KV or SQL server subsystems.

Furthermore, we want to achieve this without relying on Gossip, to
prepare for a split between the SQL and KV servers in the context of
multitenancy support (SQL server won't be able to rely on Gossip, but
will still need to announce itself to the KV servers).

\## This PR

This PR is an initial simplifying refactor that can help achieve these
goals. The init server (which hosts the Bootstrap RPC) is given more
responsibilities: it is now directly in charge of determining which, if
any, engines are bootstrapped, and explicitly listens to Gossip as well
as the Bootstrap RPC. It returns the cluster ID to the main server
start-up goroutine when it is available. As a result, the main startup
code has simplified, and a thread to be pulled on further has appeared
and is called out in TODOs.

Down the road (i.e. in later PRs), the init server will bootstrap a
NodeID and StoreIDs even when joining an existing cluster. It will
initially mimic/front-load the strategy taken by Node today, i.e. use a
`kv.DB`, but ultimately will bypass Gossip completely and use a simple
RPC call to ask the existing cluster to assign these IDs as needed. This
RPC will also establish the active cluster version, which is required
for SQL multi-tenancy, and generally follows the ideas in cockroachdb#32574.

Release note: None
@lunevalex
Copy link
Collaborator

@irfansharif @tbg will this fix also help resolve this issue #39415?

@irfansharif
Copy link
Contributor

I don't know that #39415 is applicable here, that looks to be something to be investigated separately (and nice to have as part of #6782).

@lunevalex
Copy link
Collaborator

@irfansharif I believe it maybe because we are re-working how StoreIDs are being allocated and based on the findings here #39497 that seems to be the crux of this issue.

@tbg
Copy link
Member Author

tbg commented Jul 14, 2020

This would help in some situations but not in others.

Consider a three-node cluster that you shut down completely. Then you try to restart n1 with an extra store. The Connect RPC can't allocate the extra StoreID since there is no KV layer available. We need to proceed with startup (leaving the extra store alone for now) and initialize the store ID later, when the other two nodes are also up.

I wrote a proposal on the other thread on how this could be done:

#39415 (comment)

irfansharif added a commit to irfansharif/cockroach that referenced this issue Aug 11, 2020
This mostly follows the ideas in cockroachdb#32574, and serves as a crucial
building block for cockroachdb#48843. Specifically this PR introduces a new Join
RPC that new nodes can use, addressing already initialized nodes, to
learn about the cluster ID and its node id. Previously joining nodes
were responsible for allocating their own IDs and used to discover the
cluster ID.

By moving towards a more understandable flow of how nodes joins the
cluster, we can build a few useful primitives on top of this:
 - we can prevent mismatched version nodes from joining the cluster
 - we can prevent decommissioned nodes from joining the cluster
 - we can add the liveness record for a given node as soon as it joins,
   which would simplify our liveness record handling code that is
   perennially concerned with missing liveness records

The tiny bit of complexity in this PR comes from how we're able to
migrate into this behavior from the old. To that end we retain the
earlier gossip-based cluster ID discovery+node ID allocation for self
behavior. Nodes with this patch will attempt to use this join RPC, if
implemented on the addressed node, and fall back to using the previous
behavior if not. It wasn't possible to use cluster versions for this
migrations because this happens very early in the node start up process,
and the version gating this change will not be active until much later
in the crdb process lifetime.

---

There are some leftover TODOs that I'm looking to address in this PR.
They should be tiny, and be easy to retro-fit into what we have so far.
Specifically I'm going to plumb the client address into the RPC so the
server is able to generate backlinks (and solve the bidirectionality
problem). I'm also going to try and add the liveness record for a
joining node as part of the join rpc. Right now the tests verifying
connectivity/bootstrap/join flags pass out of the box, but I'm going to
try adding more randomized testing here to test full connectivity once I
address these TODOs.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Aug 20, 2020
This mostly follows the ideas in cockroachdb#32574, and serves as a crucial
building block for cockroachdb#48843. Specifically this PR introduces a new Join
RPC that new nodes can use, addressing already initialized nodes, to
learn about the cluster ID and its node id. Previously joining nodes
were responsible for allocating their own IDs and used to discover the
cluster ID.

By moving towards a more understandable flow of how nodes joins the
cluster, we can build a few useful primitives on top of this:
 - we can prevent mismatched version nodes from joining the cluster
 - we can prevent decommissioned nodes from joining the cluster
 - we can add the liveness record for a given node as soon as it joins,
   which would simplify our liveness record handling code that is
   perennially concerned with missing liveness records

The tiny bit of complexity in this PR comes from how we're able to
migrate into this behavior from the old. To that end we retain the
earlier gossip-based cluster ID discovery+node ID allocation for self
behavior. Nodes with this patch will attempt to use this join RPC, if
implemented on the addressed node, and fall back to using the previous
behavior if not. It wasn't possible to use cluster versions for this
migrations because this happens very early in the node start up process,
and the version gating this change will not be active until much later
in the crdb process lifetime.

---

There are some leftover TODOs that I'm looking to address in this PR.
They should be tiny, and be easy to retro-fit into what we have so far.
Specifically I'm going to plumb the client address into the RPC so the
server is able to generate backlinks (and solve the bidirectionality
problem). I'm also going to try and add the liveness record for a
joining node as part of the join rpc. Right now the tests verifying
connectivity/bootstrap/join flags pass out of the box, but I'm going to
try adding more randomized testing here to test full connectivity once I
address these TODOs.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Aug 24, 2020
This mostly follows the ideas in cockroachdb#32574, and serves as a crucial
building block for cockroachdb#48843. Specifically this PR introduces a new Join
RPC that new nodes can use, addressing already initialized nodes, to
learn about the cluster ID and its node ID. Previously joining nodes
were responsible for allocating their own IDs and used to discover the
cluster ID.

By moving towards a more understandable flow of how nodes joins the
cluster, we can build a few useful primitives on top of this:
 - we can prevent mismatched version nodes from joining the cluster
   which (this commit)
 - we can allocate the first store ID for a given node, which is a nice
   code simplification (this commit)
 - we can prevent decommissioned nodes from joining the cluster
   (future PR)
 - we can eliminate another usage of gossip where we previously used it
   to disseminate the cluster ID. In the 21.1 cycle we can defer gossip
   start until much later in the server start lifecycle (future PR)
 - we can add the liveness record for a given node as soon as it joins,
   which would simplify our liveness record handling code that is
   perennially concerned with missing liveness records (future PR)

The tiny bit of complexity in this PR comes from how we're able to
migrate into this behavior from the old. To that end we retain the
earlier gossip-based cluster ID discovery+node ID allocation for self
behavior. Nodes with this patch will attempt to use this join RPC, if
implemented on the addressed node, and fall back to using the previous
behavior if not.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Aug 25, 2020
This mostly follows the ideas in cockroachdb#32574, and serves as a crucial
building block for cockroachdb#48843. Specifically this PR introduces a new Join
RPC that new nodes can use, addressing already initialized nodes, to
learn about the cluster ID and its node ID. Previously joining nodes
were responsible for allocating their own IDs and used to discover the
cluster ID.

By moving towards a more understandable flow of how nodes joins the
cluster, we can build a few useful primitives on top of this:
 - we can prevent mismatched version nodes from joining the cluster
   which (this commit)
 - we can allocate the first store ID for a given node, which is a nice
   code simplification (this commit)
 - we can prevent decommissioned nodes from joining the cluster
   (future PR)
 - we can eliminate another usage of gossip where we previously used it
   to disseminate the cluster ID. In the 21.1 cycle we can defer gossip
   start until much later in the server start lifecycle (future PR)
 - we can add the liveness record for a given node as soon as it joins,
   which would simplify our liveness record handling code that is
   perennially concerned with missing liveness records (future PR)

The tiny bit of complexity in this PR comes from how we're able to
migrate into this behavior from the old. To that end we retain the
earlier gossip-based cluster ID discovery+node ID allocation for self
behavior. Nodes with this patch will attempt to use this join RPC, if
implemented on the addressed node, and fall back to using the previous
behavior if not.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Aug 28, 2020
This mostly follows the ideas in cockroachdb#32574, and serves as a crucial
building block for cockroachdb#48843. Specifically this PR introduces a new Join
RPC that new nodes can use, addressing already initialized nodes, to
learn about the cluster ID and its node ID. Previously joining nodes
were responsible for allocating their own IDs and used to discover the
cluster ID.

By moving towards a more understandable flow of how nodes joins the
cluster, we can build a few useful primitives on top of this:
 - we can prevent mismatched version nodes from joining the cluster
   which (this commit)
 - we can allocate the first store ID for a given node, which is a nice
   code simplification (this commit)
 - we can prevent decommissioned nodes from joining the cluster
   (future PR)
 - we can eliminate another usage of gossip where we previously used it
   to disseminate the cluster ID. In the 21.1 cycle we can defer gossip
   start until much later in the server start lifecycle (future PR)
 - we can add the liveness record for a given node as soon as it joins,
   which would simplify our liveness record handling code that is
   perennially concerned with missing liveness records (future PR)

The tiny bit of complexity in this PR comes from how we're able to
migrate into this behavior from the old. To that end we retain the
earlier gossip-based cluster ID discovery+node ID allocation for self
behavior. Nodes with this patch will attempt to use this join RPC, if
implemented on the addressed node, and fall back to using the previous
behavior if not.

Release justification: low risk, high benefit changes to existing functionality
Release note: None
craig bot pushed a commit that referenced this issue Aug 28, 2020
52526: server: introduce join rpc for node id allocation r=irfansharif a=irfansharif

This mostly follows the ideas in #32574, and serves as a crucial
building block for #48843. Specifically this PR introduces a new Join
RPC that new nodes can use, addressing already initialized nodes, to
learn about the cluster ID and its node id. Previously joining nodes
were responsible for allocating their own IDs and used to discover the
cluster ID.

By moving towards a more understandable flow of how nodes joins the
cluster, we can build a few useful primitives on top of this:
 - we can prevent mismatched version nodes from joining the cluster
 - we can prevent decommissioned nodes from joining the cluster
 - we can add the liveness record for a given node as soon as it joins,
   which would simplify our liveness record handling code that is
   perennially concerned with missing liveness records

The tiny bit of complexity in this PR comes from how we're able to
migrate into this behavior from the old. To that end we retain the
earlier gossip-based cluster ID discovery+node ID allocation for self
behavior. Nodes with this patch will attempt to use this join RPC, if
implemented on the addressed node, and fall back to using the previous
behavior if not. It wasn't possible to use cluster versions for this
migrations because this happens very early in the node start up process,
and the version gating this change will not be active until much later
in the crdb process lifetime.

---

There are some leftover TODOs that I'm looking to address in this PR.
They should be tiny, and be easy to retro-fit into what we have so far.
Specifically I'm going to plumb the client address into the RPC so the
server is able to generate backlinks (and solve the bidirectionality
problem). I'm also going to try and add the liveness record for a
joining node as part of the join rpc. Right now the tests verifying
connectivity/bootstrap/join flags pass out of the box, but I'm going to
try adding more randomized testing here to test full connectivity once I
address these TODOs.

Release justification: Low risk, high benefit changes to existing functionality
Release note: None


Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
@tbg
Copy link
Member Author

tbg commented Sep 1, 2020

@irfansharif this is done, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-server Relating to the KV-level RPC server C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

No branches or pull requests

3 participants