Added multiple policies for renaming nodes #5008

pierresouchay · 2018-11-27T13:46:18Z

Addressing #4414 is hard as many unit tests do rely on having empty nodeIDs.

It seems there are several usages of nodeId:

legacy ones (complicated, see Node renaming - Support for allow_node_renaming #4399 )
cloud based (see Duplicate Node IDs after upgrading to 1.2.3 #4741 ) where people want to replace a node with the same name quickly
bare-metal ones (like us) with people implementing constant nodeIds accross installations.

This PR allows to address those 3 use-cases by adding a new options node_renaming_policy=(legacy|dead|strict)

legacy (the default) will keep the existing behavior
dead: do not allow nodes without IDs, but allow to replace a dead node with another one with the same name, but different ID (this is DONE by checking that serfHealth is critical). Stilll allow renaming nodes
strict: do not allow empty nodeID, allow to rename nodes with the same ID, do now allow to steal a node Name as soon as the node is within the catalog.

pierresouchay · 2018-11-27T22:32:14Z

@mkeeler @banks what do you think of this proposal? (if you remember our conversations in #3983 and #4399, it is a hot topic)

I think it is a good alternative to the current status quo, and will allow to remove progressively the legacy and hard to debug method of current node registration. It will also help people using Consul in Cloud environments as described in #4741 while allowing people like us with bare-metal nodes and fixed nodeids to have clearer policies regarding nodes renaming.

Kind regards

banks · 2018-11-28T11:53:27Z

Scheduling this for attention soon - this issue/proposal is on our radar and will hopefully have cycles to consider it soon.

* Added better checks and error messages to get proper validation of value * Use constants to avoid duplicating strings everywhere

types/noderenaming.go

agent/config/builder.go

agent/consul/state/catalog.go

…ere needed

linydquantil · 2018-12-05T09:19:52Z

upup

aashitvyas · 2018-12-14T18:05:41Z

I am curious to know when this fix is getting merged into the master. We are doing upgrade on Consul from 1.0.6 to 1.3.1 and encountering issue mentioned in #4741. If this fix will get merge soon, we can have 1.3.1 on our environment rather then upgrading to 1.2.2

banks

Thanks @pierresouchay!

This is another awesome contribution. I think overall having different policies is a good solution here. I have a few specific comments in the code about changes.

The biggest question though is about if it is safe to have this in config (spoiler alert, it's not really).

This is the first time that some state that is not governed by Raft is able to affect the decisions made inside the state machine! That sounds really dangerous!

For example if one of your servers accidentally had a different policy set to the others when it came up, it might reject a node registration that the other servers accepted and so the state would diverge and be corrupt!

In this one case the scope for divergence is somewhat limited and I think recoverable by changing config and reregistering nodes, but I'm extremely reluctant to go down that path. Thinking through all the transitive deps this could cause for anything later operation that might observe the different state on different replicas is a minefield.

The good news is there are other alternatives. The way we handle this for things like Connect CA bootstrap and Autopilot config is that the config values just bootstrap the cluster and then the actual source of truth for the current policy is actually in the raft state.

The bad new is that to do that needs quite a bit of extra work - a new state store table and all new raft operations to manage it, snapshot it, API endpoints to query and update it once it's bootstrapped etc.

Another (simpler and maybe better) option is that you keep it in config on servers and you add the effective policy as an argument to the raft operation for registering a node. This way the leader tells the cluster as part of the replicated operation what policy it is being submitted under so it is guaranteed that all replicas will do the same thing again. That saves a lot of extra work managing config in raft (no API needed to update) and makes it easier to switch config on the server - the current leader decides the policy. That way having different policies on different servers is still kinda bad because you might change behaviour when leadership changes but at least it doesn't threaten the integrity of the FSM.

What do you think?

banks · 2018-12-14T21:34:27Z

agent/consul/fsm/fsm.go

 }

 // New is used to construct a new FSM with a blank state.
-func New(gc *state.TombstoneGC, logOutput io.Writer) (*FSM, error) {
-	stateNew, err := state.NewStateStore(gc)
+func New(gc *state.TombstoneGC, logOutput io.Writer, nodeRenamingPolicy types.NodeRenamingPolicy) (*FSM, error) {


Passing this in right through the consul config, server, FSM, and state store constructor seems like the wrong approach for something that only affects one operation on the state store. Is there a reason we cant just make an option on EnsureNode that flips this and can be sett in the Register RPC direct from the agent config?

Actually... forget the code sprawl, this indicates a much bigger issue - making FSM sate depend on outside input that is not guaranteed to be the same on all replicas. See my review comment.

banks · 2018-12-14T21:37:17Z

agent/consul/state/catalog.go

@@ -354,59 +354,124 @@ func (s *Store) EnsureNode(idx uint64, node *structs.Node) error {
 	return nil


What about if this was changed to EnsureNode(idx uint64, node *structs.Node, renamingPolicy types.NodeRenamingPolicy). I think of the top of my head there is only one call site that in the register RPC that would need to change then and it can lookup direct from agent config?

banks · 2018-12-14T21:41:46Z

agent/consul/state/catalog.go

-func (s *Store) ensureNodeTxn(tx *memdb.Txn, idx uint64, node *structs.Node) error {
+func (s *Store) findExistingNodeRenameDeadNodes(tx *memdb.Txn, idx uint64, node *structs.Node) (*structs.Node, error) {
+	if node.ID == "" {
+		return nil, fmt.Errorf("Empty Node IDs is not supported for : %s", node.Node)


Suggested change

return nil, fmt.Errorf("Empty Node IDs is not supported for : %s", node.Node)

return nil, fmt.Errorf("Empty Node ID is not supported for dead rename policy: %s", node.Node)

banks · 2018-12-14T21:43:51Z

agent/consul/state/catalog.go

+			return nil, fmt.Errorf("Cannot get status of node %s due to: %s", dupNode.Node, err)
+		}
+		if dupNodeCheck == nil {
+			return nil, fmt.Errorf("Cannot RenameNode since check %s not found for node %s", string(structs.SerfCheckID), dupNode.Node)


Suggested change

return nil, fmt.Errorf("Cannot RenameNode since check %s not found for node %s", string(structs.SerfCheckID), dupNode.Node)

return nil, fmt.Errorf("Cannot rename node since check %s not found for node %s", string(structs.SerfCheckID), dupNode.Node)

banks · 2018-12-14T21:45:35Z

agent/consul/state/catalog.go

+		if dupNodeCheck == nil {
+			return nil, fmt.Errorf("Cannot RenameNode since check %s not found for node %s", string(structs.SerfCheckID), dupNode.Node)
+		}
+		existingDupNodeSerf := dupNodeCheck.(*structs.HealthCheck)


The possibility of this panicing makes me itchy, can we do the _, ok := check and just error if the type is wrong? Paranoid and defensive sure but panics are bad!

banks · 2018-12-14T21:46:29Z

agent/consul/state/catalog.go

+		}
+		existingDupNodeSerf := dupNodeCheck.(*structs.HealthCheck)
+		if existingDupNodeSerf.Status != api.HealthCritical {
+			// This is ok, we allow to take the identity of that node


This comment seems in the wrong place - it's saying "OK we can take identity" but it's in the failing branch...

banks · 2018-12-14T21:47:53Z

agent/consul/state/catalog.go

+		existingDupNodeSerf := dupNodeCheck.(*structs.HealthCheck)
+		if existingDupNodeSerf.Status != api.HealthCritical {
+			// This is ok, we allow to take the identity of that node
+			return dupNode, fmt.Errorf("Cannot rename since node %s because check %s is '%s'", dupNode.Node, string(structs.SerfCheckID), existingDupNodeSerf.Status)


Suggested change

return dupNode, fmt.Errorf("Cannot rename since node %s because check %s is '%s'", dupNode.Node, string(structs.SerfCheckID), existingDupNodeSerf.Status)

return dupNode, fmt.Errorf("Cannot rename node because existing node named %s is still alive", dupNode.Node)

banks · 2018-12-14T21:48:32Z

agent/consul/state/catalog.go

+
+func (s *Store) findExistingNodeStrict(tx *memdb.Txn, idx uint64, node *structs.Node) (*structs.Node, error) {
+	if node.ID == "" {
+		return nil, fmt.Errorf("Empty Node IDs is not supported for : %s", node.Node)


Suggested change

return nil, fmt.Errorf("Empty Node IDs is not supported for : %s", node.Node)

return nil, fmt.Errorf("Empty Node ID is not supported with strict rename policy for: %s", node.Node)

pierresouchay · 2018-12-14T22:31:40Z

@banks I did think a while ago regarding this and thought about something: what about applying this on each agent individually ?

Thus, a node could specify if it has to be legacy, strict or "cloud ready"... Meaning the initial onode to be replaced would store the catalog the policy.

It make sense for instance in places where Consul in used in both bare-metal and cloudy environments.

What do you think about that?

That's a bit problematic for wide policies (I mean, that does not depend upon a single node to take the decision), but for our case, I think it would work.

banks · 2018-12-14T23:00:50Z

I’ve just remembered we discussed this last week and came to a conclusion that I forgot to write down.

I’ll need to think about this more and expand upon it but it boils down to this:

Serf already allows nodes to be renamed safely - it has a mechanism to ensure a node rename is accepted by the cluster and will prevent nodes from claiming a name when another node disputes that it presently has that name.
so in general if Serf allows the new node to join then it’s pretty safe to allow it to register and take over that name
so we should forget having multiple modes and just make the normal behaviour be for the server to allow the rename IF Serf allows the join - exactly how we do that I’ve not looked into yet.

Another reason to avoid multiple modes is that external services (using ESM or otherwise) don’t have UUIDs so neither strict or dead mode here would work for them and it’s not reasonable to call that “legacy”. We need to continue to support no UUID registrations for external nodes first class.

Will come back to this when I get a chance, just wanted to note this while I remember!

curtbusch · 2018-12-17T18:12:20Z

Any idea if this will fix the Catalog.Register" RPC failed to server xxx.xxx.xxx.xxx:8300: rpc error making call: failed inserting node: Error while renaming Node ID: <node-id>: Node name xxx.xxx.xxx is reserved by node <node-id> with name xxx.xxx.xxx error if you restore from a snapshot?

In our instance of consul if we restore from a snapshot whose node name is the same as the fresh consul node name we start seeing the error once we set an agent token. This is running consul 1.4 on Azure aks deployed using hashicorp's helm chart.

banks · 2019-01-10T14:08:57Z

@cwillbusch yeah that's a similar issue that would be fixed when we get this right. The complication there is that the snapshot restore is essentially time-travelling back to when a different node was registered in the same way!

If you manually deregister the node you should be able to get it to work in the interim. My proposed solution above (not this PR proposal) would fix it automatically since Serf layer would know that the old node is no longer around and so allow the re-registration.

curtbusch · 2019-01-11T19:41:14Z

@banks Thanks for your clarification!

Could you let me know the proper way to manually deregister a node so that i can use this solution until your proposed solution is implemented?

banks · 2019-01-17T15:51:19Z

Heh that is actually a good question with a slightly sad answer @cwillbusch (CLI for this #2981 if interested).

You should be able to do so via the API though: https://www.consul.io/api/catalog.html#deregister-entity.

E.g.:

curl -i -X PUT -d '{"Node":"c1"}' 'http://localhost:8500/v1/catalog/deregister'

Note that despite the docs, this didn't seem to work when i attempted to use the Node's ID rather than its name - I think those docs predate Node IDs being a separate UUID. But this works I tested locally - the new node immediately is able to sync on the next attempt even without a restart.

Nmishin · 2019-02-14T15:37:02Z

Hi @banks, seems I have a related question here:
previously I have a Consul and clients 0.8.4 version, after upgrade Consul server to the 1.3.1 version I see in logs a lot of errors something like that:
consul: RPC failed to server 10.10.9.5:8300: rpc error: rpc error making call: failed inserting node: Error while renaming Node ID: "e1d3e56d-9226-bf3a-571d-2a7e500c37dd": Node name dev-test-zcwf is reserved by node de084f26-6488-71a8-812f-e5f680ef4a17 with name dev-test-zcwf
As I understand dev-test-zcwf node was recreated with the same name, but a new id, what should I do with this?
I didn't restore Consul from a snapshot, only upgrade Consul with data saving.
Do I need set -disable-host-node-id to false for preventing randomly generated node id?
Thanks.

banks · 2019-02-18T11:31:23Z

@Nmishin the answer is the same as above - assuming the old node actually doesn't exist, you either need to wait 72 hours or manually deregister the old node before you can add the new one with the curl command above.

Once we get the real fix in highlighted above, this problem should be able to solve itself automatically.

kevstigneev · 2019-02-20T13:56:24Z

This is the first time that some state that is not governed by Raft is able to affect the decisions made inside the state machine! That sounds really dangerous!

Could not agree more.

Let's take it logically.

A node is uniquely identified in the cluster by its node_id and each node can have just one node_id. If that's true, one of the following must be true as well:

several nodes can share the same node_name;
node_id and node_name are effectively equal;
a node can have several node_name's.

A node in Consul cluster is a data object. It is not the same as a Consul agent instance. Consul node and agent instance have disjoint lifecycles. node_id is meant to identify a data object, node name - an agent instance. If node_name is globally unique (space-time wise), the second option above (node_name ~ node_id) is fine.

However operators naturally want to use DNS names as node_name. DNS requires only spacial uniqueness. It does not formally require or guarantee that a name is not reused over time. It inhibits the second option.

From the remaining 2 options (id:name as N:1 and 1:N) the last one (multiple names per cluster) does not make sense. It means that a Consul agent can be represented by several Raft nodes.

It leaves just one option - several cluster nodes (data objects) can share the same node_name.

If a specific operation environment imposes extra restrictions, they should be addressed on top of the above. Not in the core.

banks · 2019-02-20T20:40:19Z

@kevstigneev thanks for the input. I think the overall decision here is clear so I'll close this.

For the record though:

node_id is meant to identify a data object, node name - an agent instance.

This isn't quite right - node IDs are unique identifiers for the agent too - the reason we added them as well as node names is that people like to rename their nodes from time to time so we wanted a separate stable ID. The issue is that it's impossible to make a truely stable ID - we try using hardware identifiers or failing that storing it on disk but then people find creative reasons that those change e.g. they rebuild an instance but keep same host name so now it has a new ID but same name.

That's the crux of the problem. Pierre's proposal here isn't bad or wrong, it's just that we've realised we already have a mechanism (Serf) that can automatically resolve this conflict correctly without the complexity of explicit new behaviour options in the state machine or elsewhere.

We'll revisit this idea though if Serf can't solve for all cases as we expect.

pierresouchay force-pushed the multi_renaming_node_modes branch from b6f7033 to 3e094fd Compare November 27, 2018 13:51

Added multiple policies for renaming nodes

a10683e

pierresouchay force-pushed the multi_renaming_node_modes branch from 3e094fd to a10683e Compare November 27, 2018 15:25

Fixed unstable test TestAgent_HTTPCheck_EnableAgentTLSForChecks

b4ab67a

pierresouchay mentioned this pull request Nov 27, 2018

Duplicate Node IDs after upgrading to 1.2.3 #4741

Closed

banks added this to the Upcoming milestone Nov 28, 2018

banks added the needs-discussion Topic needs discussion with the larger Consul maintainers before committing to for a release label Nov 28, 2018

Cleanup & Documentation: added new node_renaming_policy documentation

9915b42

* Added better checks and error messages to get proper validation of value * Use constants to avoid duplicating strings everywhere

ShimmerGlass reviewed Nov 28, 2018

View reviewed changes

types/noderenaming.go Show resolved Hide resolved

agent/config/builder.go Outdated Show resolved Hide resolved

agent/consul/state/catalog.go Outdated Show resolved Hide resolved

agent/consul/state/catalog.go Show resolved Hide resolved

Added a specific type for NodeRenamingPolicy

ce8ee44

pierresouchay force-pushed the multi_renaming_node_modes branch from 1b24667 to ce8ee44 Compare November 28, 2018 14:44

pierresouchay added 2 commits November 28, 2018 16:19

Fixed unstable test TestAPI_HealthService_NodeMetaFilter

4e2974e

Improved Error messages and test coverage, use structs.SerfCheckID wh…

98b4c3b

…ere needed

pierresouchay mentioned this pull request Nov 29, 2018

disable_host_node_id": false does not work as advertised on Linux #4914

Closed

banks requested changes Dec 14, 2018

View reviewed changes

pearkes mentioned this pull request Jan 15, 2019

duplicate ghost node catalog but not in consul members #5220

Closed

banks closed this Feb 20, 2019

kyhavlov mentioned this pull request Mar 14, 2019

Allow nodes to change IDs when replacing a dead node #5485

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added multiple policies for renaming nodes #5008

Added multiple policies for renaming nodes #5008

pierresouchay commented Nov 27, 2018

pierresouchay commented Nov 27, 2018

banks commented Nov 28, 2018

linydquantil commented Dec 5, 2018

aashitvyas commented Dec 14, 2018

banks left a comment •

edited

Loading

banks Dec 14, 2018

banks Dec 14, 2018

banks Dec 14, 2018

banks Dec 14, 2018

banks Dec 14, 2018

banks Dec 14, 2018

banks Dec 14, 2018

banks Dec 14, 2018

banks Dec 14, 2018

pierresouchay commented Dec 14, 2018

banks commented Dec 14, 2018 •

edited

Loading

curtbusch commented Dec 17, 2018

banks commented Jan 10, 2019

curtbusch commented Jan 11, 2019

banks commented Jan 17, 2019

Nmishin commented Feb 14, 2019

banks commented Feb 18, 2019

kevstigneev commented Feb 20, 2019

banks commented Feb 20, 2019 •

edited

Loading

		@@ -354,59 +354,124 @@ func (s Store) EnsureNode(idx uint64, node structs.Node) error {
		return nil

	return nil, fmt.Errorf("Empty Node IDs is not supported for : %s", node.Node)
	return nil, fmt.Errorf("Empty Node ID is not supported for dead rename policy: %s", node.Node)

	return nil, fmt.Errorf("Cannot RenameNode since check %s not found for node %s", string(structs.SerfCheckID), dupNode.Node)
	return nil, fmt.Errorf("Cannot rename node since check %s not found for node %s", string(structs.SerfCheckID), dupNode.Node)

	return dupNode, fmt.Errorf("Cannot rename since node %s because check %s is '%s'", dupNode.Node, string(structs.SerfCheckID), existingDupNodeSerf.Status)
	return dupNode, fmt.Errorf("Cannot rename node because existing node named %s is still alive", dupNode.Node)

Added multiple policies for renaming nodes #5008

Added multiple policies for renaming nodes #5008

Conversation

pierresouchay commented Nov 27, 2018

pierresouchay commented Nov 27, 2018

banks commented Nov 28, 2018

linydquantil commented Dec 5, 2018

aashitvyas commented Dec 14, 2018

banks left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pierresouchay commented Dec 14, 2018

banks commented Dec 14, 2018 • edited Loading

curtbusch commented Dec 17, 2018

banks commented Jan 10, 2019

curtbusch commented Jan 11, 2019

banks commented Jan 17, 2019

Nmishin commented Feb 14, 2019

banks commented Feb 18, 2019

kevstigneev commented Feb 20, 2019

banks commented Feb 20, 2019 • edited Loading

banks left a comment •

edited

Loading

banks commented Dec 14, 2018 •

edited

Loading

banks commented Feb 20, 2019 •

edited

Loading