storage: allocate NodeID via one-off RPC to initialized node #32574

tbg · 2018-11-23T11:45:47Z

Our handling of the NodeID is currently quite involved (and leaks into many places) because we pretty much require a working node to be set up before it can use its own KV store to request a NodeID.

There are two levels of improvement here. The first is that once Gossip is connected, and the Node is uninitialized, we use a one-off RPC AllocateNodeID which simply asks a remote node of our choosing (taken from Gossip) to allocate a NodeID for us. This shifts the complexity of using the KV store to a node that has it in perfect working condition. (Similar improvements apply to StoreID allocation, which we may be able to hoist out of the depths of (*Node).Start where it is buried right now).

This removes uninitialized NodeIDs from everywhere except Gossip. Removing it from Gossip is more involved, depending on what guarantees we want to offer. The basic idea is to read the Gossip bootstrap list directly to send AllocateNodeID, and not starting Gossip before that has succeeded. However, consider the case in which a node isn't able to reach out to other nodes (perhaps its join flags are stale) but other nodes are able to reach out to this node (this is a poor setup, but it could happen): we'd get stuck, unless there's at least a minimal subset of Gossip running that would add the peer's address to the list of addresses we pick from for AllocateNodeID. This doesn't seem terrible to implement, but it's a little more involved.

Related to #30553.

The text was updated successfully, but these errors were encountered:

tbg · 2019-08-12T14:08:17Z

Join-time NodeID/ClusterID RPC

Switch to a dedicated mechanism for joining an existing cluster that does not require a running Gossip network and Node.

Problem

Nodes joining a cluster for the first time currently perform an awkward dance: to allocate a nodeID, they set up a more or less fully functional node so that they can run a KV Increment request on the key that houses the NodeID counter. When the NodeID is assigned, the already-running server needs to patch it into all of the components that want to know it. The result is this code

cockroach/pkg/base/node_id.go

Lines 23 to 32 in c097a16

    
           // NodeIDContainer is used to share a single roachpb.NodeID instance between 
        
           // multiple layers. It allows setting and getting the value. Once a value is 
        
           // set, the value cannot change. 
        
           type NodeIDContainer struct { 
        
           	_ util.NoCopy 
        
           	// nodeID is atomically updated under the mutex; it can be read atomically 
        
           	// without the mutex. 
        
           	nodeID int32 
        
           }

which complicates matters in all components in which the NodeID plays a role (that's many of them).

A similar problem occurs for the ClusterID, and there's a corresponding container:

cockroach/pkg/base/cluster_id.go

Lines 21 to 33 in c097a16

    
           // ClusterIDContainer is used to share a single Cluster ID instance between 
        
           // multiple layers. It allows setting and getting the value. Once a value is 
        
           // set, the value cannot change. 
        
           // 
        
           // The cluster ID is determined on startup as follows: 
        
           // - If there are existing stores, their cluster ID is used. 
        
           // - If the node is bootstrapping, a new UUID is generated. 
        
           // - Otherwise, it is determined via gossip with other nodes. 
        
           type ClusterIDContainer struct { 
        
           	syncutil.Mutex 
        
           	clusterID uuid.UUID 
        
           }

The ClusterID is primarily distributed via Gossip, and nodes asking to join a cluster must thus connect to Gossip, before they're really ready to do so.

Start-up sequence today

We shouldn't have to set up an incomplete server and deal with updating it in-place. Instead, a node needing to bootstrap via joining should, at a high level, ask an existing node for the ClusterID and a freshly assigned NodeID before really entering its start sequence.

The sequence to join a cluster today takes roughly the following form

First, the node must find that no NodeID/ClusterID has been persisted yet

cockroach/pkg/server/server.go

Lines 1308 to 1315 in 6cb71bf

    
           } else { 
        
           	// We have no existing stores. We start an initServer and then wait for 
        
           	// one of the following: 
        
           	// 
        
           	// - gossip connects (i.e. we're joining an existing cluster, perhaps 
        
           	//   freshly bootstrapped but this node doesn't have to know) 
        
           	// - we auto-bootstrap (if no join flags were given) 
        
           	// - a client bootstraps a cluster via node.

and that the gossip resolver slice is not empty

cockroach/pkg/server/server.go

Lines 1347 to 1353 in 6cb71bf

    
           if len(s.cfg.GossipBootstrapResolvers) == 0 { 
        
           	// If the _unfiltered_ list of hosts from the --join flag is 
        
           	// empty, then this node can bootstrap a new cluster. We disallow 
        
           	// this if this node is being started with itself specified as a 
        
           	// --join host, because that's too likely to be operator error. 
        
           	if _, err := s.initServer.Bootstrap(ctx, &serverpb.BootstrapRequest{}); err != nil { 
        
           		return errors.Wrap(err, "while bootstrapping")

(otherwise, it either already has NodeID/ClusterID or will bootstrap a new cluster).

Now we accept incoming requests on the init server (but in this case there aren't any, since we look at joining an existing cluster). In today's code, Gossip is already started at this point (even though it would like to know a NodeID, tough luck). Two things can happen: either we get an explicit request to bootstrap a new cluster, in which case that is done. Otherwise, Gossip manages to connect, this means that some other node was bootstrapped, and it tells us the cluster ID and we continue.

Next, the node is started. This is awkward: we're starting a node without a NodeID because that gives us a KV stack that we can then use to allocate a NodeID:

cockroach/pkg/server/node.go

Lines 358 to 375 in 6cb71bf

    
           } else { 
        
           	n.initialBoot = true 
        
           	// Wait until Gossip is connected before trying to allocate a NodeID. 
        
           	// This isn't strictly necessary but avoids trying to use the KV store 
        
           	// before it can possibly work. 
        
           	if err := n.connectGossip(ctx); err != nil { 
        
           		return err 
        
           	} 
        
           	// If no NodeID has been assigned yet, allocate a new node ID. 
        
           	ctxWithSpan, span := n.AnnotateCtxWithSpan(ctx, "alloc-node-id") 
        
           	newID, err := allocateNodeID(ctxWithSpan, n.storeCfg.DB) 
        
           	if err != nil { 
        
           		return err 
        
           	} 
        
           	log.Infof(ctxWithSpan, "new node allocated ID %d", newID) 
        
           	span.Finish() 
        
           	nodeID = newID 
        
           }

This is the step that will need more work. We want this code to go away.

Start-up sequence (proposed)

If we are waiting for init/join, roughly here

https://github.com/cockroachdb/cockroach/blob/6cb71bf9c4fb64711d20c269d43586173056bbc9/pkg/server/server.go#L1388-L1387

we want to

grab the --join flags
hit a newly added JoinCluster (name TBD) endpoint for the first join target using
a one-shot grpc connection (but using the right certs, etc)
if there's no response (i.e. a timeout or "I'm also trying to join" error), try the next one. Wrap around if necessary.
Once there is a response, it contains our NodeID. Great! Now we can persist that (note below that we also get a StoreID back for convenience) and actually boot. We can bootstrap the first store before (*Node).start, so the bootstrapped stores slice here is nonempty:

cockroach/pkg/server/server.go

Line 1377 in 6cb71bf

bootstrappedEngines, emptyEngines, cv, err := inspectEngines(

and everything falls into place.

The message definitions for this could look like this:

message JoinNodeRequest {
  // NB: a node that joins has the lowest version active that it supports,
  // i.e. its MinSupportedVersion. We could just name it that, too?
  // Recipient will refuse this request if it's not compatible with active_version.
  roachpb.Version active_version = 1;
}
message JoinNodeResponse {
  int32 node_id = 1 [(gogoproto.customname) = "NodeID"];
  // A first StoreID allocated for the new node. The Node can in principle allocate
  // it own StoreIDs once it has a NodeID, but it doesn't have a good way to persist
  // a NodeID until it also has an initialized Store. Handing out a one-off StoreID here
  // is a trivial way to avoid additional complexity related to that.
  int32 store_id = 2 [(gogoproto.customname) = "StoreID"];
  bytes cluster_id = 3 [(gogoproto.customname) = "ClusterID"];
}

We still haven't removed the early Gossip server. I believe we can start it only after the ClusterID/NodeID are known from a successful JoinNode request simply because it isn't required for bootstrapping any more. And we should be able to bypass *rpc.Context (and its intrinsic need to want to know cluster version, NodeID, clusterID) etc by using one-off grpc connections during bootstrap.

Potential issue: (more) loss of bidirectionality

Consider the following setup, where an edge from i to j means that node i has node j in its join flags.

2
↕
1 ← 3
↑
4

Assume all nodes are initially uninitialized (i.e. waiting for bootstrap). The cluster will only come together successfully if the node that receives the BootstrapRequest is reachable (transitively) from all other nodes.

For example:

bootstrap 1: works, all nodes will reach out to 1
bootstrap 2: works, 1 will reach out to two and 3,4 can reach out to 1 after that
bootstrap 3/4: nope, the other nodes will never reach out to them

In today's code, bootstrapping 3 would actually "push" the clusterID to 1 via Gossip and then 1 would initialize, and then everyone else. This is enough of a limitation to do something about, for example add a ShouldJoinRequest which, when received, simply triggers a JoinRequest back to the sender (thus replacing the push mechanism).

\## Motivation Today, starting up a server is complicated. This is especially true when bootstrap is necessary. By this, we mean that either - no NodeID is known. This implies that none of the engines were initialized yet - the NodeID is known (i.e. at least one store is initialized), but a new engine was added. When the process first starts, and a NodeID is known, a ClusterID is also known (since they get persisted together, on every store). For the same reason, a persisted cluster version is known. In this case, we can start a fully initialized cluster, so this is the easy case. It is more difficult when no NodeID is known, in which case the server is just starting up for the first time, with its engines all blank. It needs to somehow allocate NodeIDs (and StoreIDs) which are then written to the engines. It also needs to, at least initially, use the lowest possible cluster version it can tolerate (after all, the cluster may actually run at that lowest version). Right now, there is a delicate dance: we thread late-bound ClusterID and NodeID containers all over the place, spin up a mostly dysfunctional Node, use its KV client to allocate NodeID and StoreIDs once this is possible - we need Gossip to have connected first - and update the containers. It is complex, error prone, and ossifies any code it touches. Cluster versions deserve an extra shout-out for complexity. Even if a cluster version was persisted, the node may have been decommissioned and the cluster upgraded. Much work went into our RPC layer to prevent connections between incompatible nodes, but there is no boot-time check that results in a swift and descriptive error - there will be a fatal error, originating from the RPC subsystem. One aim of our work is to simplify this by checking the version via an RPC before setting too many gears in motion. \## Context This marks the beginning of a series of PRs aiming at improving the startup code. Ultimately, the goal is to have the Node and all Stores bootstrapped as early as possible, but in particular before starting the KV or SQL server subsystems. Furthermore, we want to achieve this without relying on Gossip, to prepare for a split between the SQL and KV servers in the context of multitenancy support (SQL server won't be able to rely on Gossip, but will still need to announce itself to the KV servers). \## This PR This PR is an initial simplifying refactor that can help achieve these goals. The init server (which hosts the Bootstrap RPC) is given more responsibilities: it is now directly in charge of determining which, if any, engines are bootstrapped, and explicitly listens to Gossip as well as the Bootstrap RPC. It returns the cluster ID to the main server start-up goroutine when it is available. As a result, the main startup code has simplified, and a thread to be pulled on further has appeared and is called out in TODOs. Down the road (i.e. in later PRs), the init server will bootstrap a NodeID and StoreIDs even when joining an existing cluster. It will initially mimic/front-load the strategy taken by Node today, i.e. use a `kv.DB`, but ultimately will bypass Gossip completely and use a simple RPC call to ask the existing cluster to assign these IDs as needed. This RPC will also establish the active cluster version, which is required for SQL multi-tenancy, and generally follows the ideas in cockroachdb#32574. Release note: None

46843: server: simplify bootstrap via fatter init server r=andreimatei a=tbg ## Motivation Today, starting up a server is complicated. This is especially true when bootstrap is necessary. By this, we mean that either - no NodeID is known. This implies that none of the engines were initialized yet - the NodeID is known (i.e. at least one store is initialized), but a new engine was added. When the process first starts, and a NodeID is known, a ClusterID is also known (since they get persisted together, on every store). For the same reason, a persisted cluster version is known. In this case, we can start a fully initialized cluster, so this is the easy case. It is more difficult when no NodeID is known, in which case the server is just starting up for the first time, with its engines all blank. It needs to somehow allocate NodeIDs (and StoreIDs) which are then written to the engines. It also needs to, at least initially, use the lowest possible cluster version it can tolerate (after all, the cluster may actually run at that lowest version). Right now, there is a delicate dance: we thread late-bound ClusterID and NodeID containers all over the place, spin up a mostly dysfunctional Node, use its KV client to allocate NodeID and StoreIDs once this is possible - we need Gossip to have connected first - and update the containers. It is complex, error prone, and ossifies any code it touches. Cluster versions deserve an extra shout-out for complexity. Even if a cluster version was persisted, the node may have been decommissioned and the cluster upgraded. Much work went into our RPC layer to prevent connections between incompatible nodes, but there is no boot-time check that results in a swift and descriptive error - there will be a fatal error, originating from the RPC subsystem. One aim of our work is to simplify this by checking the version via an RPC before setting too many gears in motion. ## Context This marks the beginning of a series of PRs aiming at improving the startup code. Ultimately, the goal is to have the Node and all Stores bootstrapped as early as possible, but in particular before starting the KV or SQL server subsystems. Furthermore, we want to achieve this without relying on Gossip, to prepare for a split between the SQL and KV servers in the context of multitenancy support (SQL server won't be able to rely on Gossip, but will still need to announce itself to the KV servers). ## This PR This PR is an initial simplifying refactor that can help achieve these goals. The init server (which hosts the Bootstrap RPC) is given more responsibilities: it is now directly in charge of determining which, if any, engines are bootstrapped, and explicitly listens to Gossip as well as the Bootstrap RPC. It returns the cluster ID to the main server start-up goroutine when it is available. As a result, the main startup code has simplified, and a thread to be pulled on further has appeared and is called out in TODOs. Down the road (i.e. in later PRs), the init server will bootstrap a NodeID and StoreIDs even when joining an existing cluster. It will initially mimic/front-load the strategy taken by Node today, i.e. use a `kv.DB`, but ultimately will bypass Gossip completely and use a simple RPC call to ask the existing cluster to assign these IDs as needed. This RPC will also establish the active cluster version, which is required for SQL multi-tenancy, and generally follows the ideas in #32574. Release note: None Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>

\## Motivation Today, starting up a server is complicated. This is especially true when bootstrap is necessary. By this, we mean that either - no NodeID is known. This implies that none of the engines were initialized yet - the NodeID is known (i.e. at least one store is initialized), but a new engine was added. When the process first starts, and a NodeID is known, a ClusterID is also known (since they get persisted together, on every store). For the same reason, a persisted cluster version is known. In this case, we can start a fully initialized cluster, so this is the easy case. It is more difficult when no NodeID is known, in which case the server is just starting up for the first time, with its engines all blank. It needs to somehow allocate NodeIDs (and StoreIDs) which are then written to the engines. It also needs to, at least initially, use the lowest possible cluster version it can tolerate (after all, the cluster may actually run at that lowest version). Right now, there is a delicate dance: we thread late-bound ClusterID and NodeID containers all over the place, spin up a mostly dysfunctional Node, use its KV client to allocate NodeID and StoreIDs once this is possible - we need Gossip to have connected first - and update the containers. It is complex, error prone, and ossifies any code it touches. Cluster versions deserve an extra shout-out for complexity. Even if a cluster version was persisted, the node may have been decommissioned and the cluster upgraded. Much work went into our RPC layer to prevent connections between incompatible nodes, but there is no boot-time check that results in a swift and descriptive error - there will be a fatal error, originating from the RPC subsystem. One aim of our work is to simplify this by checking the version via an RPC before setting too many gears in motion. \## Context This marks the beginning of a series of PRs aiming at improving the startup code. Ultimately, the goal is to have the Node and all Stores bootstrapped as early as possible, but in particular before starting the KV or SQL server subsystems. Furthermore, we want to achieve this without relying on Gossip, to prepare for a split between the SQL and KV servers in the context of multitenancy support (SQL server won't be able to rely on Gossip, but will still need to announce itself to the KV servers). \## This PR This PR is an initial simplifying refactor that can help achieve these goals. The init server (which hosts the Bootstrap RPC) is given more responsibilities: it is now directly in charge of determining which, if any, engines are bootstrapped, and explicitly listens to Gossip as well as the Bootstrap RPC. It returns the cluster ID to the main server start-up goroutine when it is available. As a result, the main startup code has simplified, and a thread to be pulled on further has appeared and is called out in TODOs. Down the road (i.e. in later PRs), the init server will bootstrap a NodeID and StoreIDs even when joining an existing cluster. It will initially mimic/front-load the strategy taken by Node today, i.e. use a `kv.DB`, but ultimately will bypass Gossip completely and use a simple RPC call to ask the existing cluster to assign these IDs as needed. This RPC will also establish the active cluster version, which is required for SQL multi-tenancy, and generally follows the ideas in cockroachdb#32574. Release note: None

lunevalex · 2020-07-13T21:16:48Z

@irfansharif @tbg will this fix also help resolve this issue #39415?

irfansharif · 2020-07-13T21:23:33Z

I don't know that #39415 is applicable here, that looks to be something to be investigated separately (and nice to have as part of #6782).

lunevalex · 2020-07-13T21:56:50Z

@irfansharif I believe it maybe because we are re-working how StoreIDs are being allocated and based on the findings here #39497 that seems to be the crux of this issue.

tbg · 2020-07-14T13:19:58Z

This would help in some situations but not in others.

Consider a three-node cluster that you shut down completely. Then you try to restart n1 with an extra store. The Connect RPC can't allocate the extra StoreID since there is no KV layer available. We need to proceed with startup (leaving the extra store alone for now) and initialize the store ID later, when the other two nodes are also up.

I wrote a proposal on the other thread on how this could be done:

#39415 (comment)

This mostly follows the ideas in cockroachdb#32574, and serves as a crucial building block for cockroachdb#48843. Specifically this PR introduces a new Join RPC that new nodes can use, addressing already initialized nodes, to learn about the cluster ID and its node id. Previously joining nodes were responsible for allocating their own IDs and used to discover the cluster ID. By moving towards a more understandable flow of how nodes joins the cluster, we can build a few useful primitives on top of this: - we can prevent mismatched version nodes from joining the cluster - we can prevent decommissioned nodes from joining the cluster - we can add the liveness record for a given node as soon as it joins, which would simplify our liveness record handling code that is perennially concerned with missing liveness records The tiny bit of complexity in this PR comes from how we're able to migrate into this behavior from the old. To that end we retain the earlier gossip-based cluster ID discovery+node ID allocation for self behavior. Nodes with this patch will attempt to use this join RPC, if implemented on the addressed node, and fall back to using the previous behavior if not. It wasn't possible to use cluster versions for this migrations because this happens very early in the node start up process, and the version gating this change will not be active until much later in the crdb process lifetime. --- There are some leftover TODOs that I'm looking to address in this PR. They should be tiny, and be easy to retro-fit into what we have so far. Specifically I'm going to plumb the client address into the RPC so the server is able to generate backlinks (and solve the bidirectionality problem). I'm also going to try and add the liveness record for a joining node as part of the join rpc. Right now the tests verifying connectivity/bootstrap/join flags pass out of the box, but I'm going to try adding more randomized testing here to test full connectivity once I address these TODOs. Release note: None

This mostly follows the ideas in cockroachdb#32574, and serves as a crucial building block for cockroachdb#48843. Specifically this PR introduces a new Join RPC that new nodes can use, addressing already initialized nodes, to learn about the cluster ID and its node ID. Previously joining nodes were responsible for allocating their own IDs and used to discover the cluster ID. By moving towards a more understandable flow of how nodes joins the cluster, we can build a few useful primitives on top of this: - we can prevent mismatched version nodes from joining the cluster which (this commit) - we can allocate the first store ID for a given node, which is a nice code simplification (this commit) - we can prevent decommissioned nodes from joining the cluster (future PR) - we can eliminate another usage of gossip where we previously used it to disseminate the cluster ID. In the 21.1 cycle we can defer gossip start until much later in the server start lifecycle (future PR) - we can add the liveness record for a given node as soon as it joins, which would simplify our liveness record handling code that is perennially concerned with missing liveness records (future PR) The tiny bit of complexity in this PR comes from how we're able to migrate into this behavior from the old. To that end we retain the earlier gossip-based cluster ID discovery+node ID allocation for self behavior. Nodes with this patch will attempt to use this join RPC, if implemented on the addressed node, and fall back to using the previous behavior if not. Release note: None

This mostly follows the ideas in cockroachdb#32574, and serves as a crucial building block for cockroachdb#48843. Specifically this PR introduces a new Join RPC that new nodes can use, addressing already initialized nodes, to learn about the cluster ID and its node ID. Previously joining nodes were responsible for allocating their own IDs and used to discover the cluster ID. By moving towards a more understandable flow of how nodes joins the cluster, we can build a few useful primitives on top of this: - we can prevent mismatched version nodes from joining the cluster which (this commit) - we can allocate the first store ID for a given node, which is a nice code simplification (this commit) - we can prevent decommissioned nodes from joining the cluster (future PR) - we can eliminate another usage of gossip where we previously used it to disseminate the cluster ID. In the 21.1 cycle we can defer gossip start until much later in the server start lifecycle (future PR) - we can add the liveness record for a given node as soon as it joins, which would simplify our liveness record handling code that is perennially concerned with missing liveness records (future PR) The tiny bit of complexity in this PR comes from how we're able to migrate into this behavior from the old. To that end we retain the earlier gossip-based cluster ID discovery+node ID allocation for self behavior. Nodes with this patch will attempt to use this join RPC, if implemented on the addressed node, and fall back to using the previous behavior if not. Release justification: low risk, high benefit changes to existing functionality Release note: None

52526: server: introduce join rpc for node id allocation r=irfansharif a=irfansharif This mostly follows the ideas in #32574, and serves as a crucial building block for #48843. Specifically this PR introduces a new Join RPC that new nodes can use, addressing already initialized nodes, to learn about the cluster ID and its node id. Previously joining nodes were responsible for allocating their own IDs and used to discover the cluster ID. By moving towards a more understandable flow of how nodes joins the cluster, we can build a few useful primitives on top of this: - we can prevent mismatched version nodes from joining the cluster - we can prevent decommissioned nodes from joining the cluster - we can add the liveness record for a given node as soon as it joins, which would simplify our liveness record handling code that is perennially concerned with missing liveness records The tiny bit of complexity in this PR comes from how we're able to migrate into this behavior from the old. To that end we retain the earlier gossip-based cluster ID discovery+node ID allocation for self behavior. Nodes with this patch will attempt to use this join RPC, if implemented on the addressed node, and fall back to using the previous behavior if not. It wasn't possible to use cluster versions for this migrations because this happens very early in the node start up process, and the version gating this change will not be active until much later in the crdb process lifetime. --- There are some leftover TODOs that I'm looking to address in this PR. They should be tiny, and be easy to retro-fit into what we have so far. Specifically I'm going to plumb the client address into the RPC so the server is able to generate backlinks (and solve the bidirectionality problem). I'm also going to try and add the liveness record for a joining node as part of the join rpc. Right now the tests verifying connectivity/bootstrap/join flags pass out of the box, but I'm going to try adding more randomized testing here to test full connectivity once I address these TODOs. Release justification: Low risk, high benefit changes to existing functionality Release note: None Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>

tbg · 2020-09-01T12:04:41Z

@irfansharif this is done, right?

tbg added the A-kv-server Relating to the KV-level RPC server label Nov 23, 2018

tbg added this to the 2.2 milestone Nov 23, 2018

tbg added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Nov 27, 2018

tbg mentioned this issue Apr 1, 2020

server: simplify bootstrap via fatter init server #46843

Merged

tbg mentioned this issue Apr 22, 2020

sql: version compat between SQL tenant and KV backend #47919

Closed

lunevalex assigned irfansharif Jun 29, 2020

irfansharif mentioned this issue Aug 11, 2020

server: introduce join rpc for node id allocation #52526

Merged

tbg closed this as completed Sep 1, 2020

irfansharif mentioned this issue Sep 1, 2020

server: make node startup more principled #30553

Closed

irfansharif mentioned this issue Sep 30, 2020

storage: support adding Stores to existing Nodes #39415

Closed

irfansharif mentioned this issue Mar 8, 2021

cli: the init process does not propagate through one-way --join connectivity #61621

Closed

knz mentioned this issue Apr 19, 2021

server: integrate the TLS auto-negotiation in the start commands #63850

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: allocate NodeID via one-off RPC to initialized node #32574

storage: allocate NodeID via one-off RPC to initialized node #32574

tbg commented Nov 23, 2018

tbg commented Aug 12, 2019

lunevalex commented Jul 13, 2020

irfansharif commented Jul 13, 2020

lunevalex commented Jul 13, 2020

tbg commented Jul 14, 2020

tbg commented Sep 1, 2020

storage: allocate NodeID via one-off RPC to initialized node #32574

storage: allocate NodeID via one-off RPC to initialized node #32574

Comments

tbg commented Nov 23, 2018

tbg commented Aug 12, 2019

Join-time NodeID/ClusterID RPC

Problem

Start-up sequence today

Start-up sequence (proposed)

Potential issue: (more) loss of bidirectionality

lunevalex commented Jul 13, 2020

irfansharif commented Jul 13, 2020

lunevalex commented Jul 13, 2020

tbg commented Jul 14, 2020

tbg commented Sep 1, 2020