Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added multiple policies for renaming nodes #5008

Closed

Conversation

pierresouchay
Copy link
Contributor

Addressing #4414 is hard as many unit tests do rely on having empty nodeIDs.

It seems there are several usages of nodeId:

This PR allows to address those 3 use-cases by adding a new options node_renaming_policy=(legacy|dead|strict)

  • legacy (the default) will keep the existing behavior
  • dead: do not allow nodes without IDs, but allow to replace a dead node with another one with the same name, but different ID (this is DONE by checking that serfHealth is critical). Stilll allow renaming nodes
  • strict: do not allow empty nodeID, allow to rename nodes with the same ID, do now allow to steal a node Name as soon as the node is within the catalog.

@pierresouchay
Copy link
Contributor Author

@mkeeler @banks what do you think of this proposal? (if you remember our conversations in #3983 and #4399, it is a hot topic)

I think it is a good alternative to the current status quo, and will allow to remove progressively the legacy and hard to debug method of current node registration. It will also help people using Consul in Cloud environments as described in #4741 while allowing people like us with bare-metal nodes and fixed nodeids to have clearer policies regarding nodes renaming.

Kind regards

@banks banks added this to the Upcoming milestone Nov 28, 2018
@banks banks added the needs-discussion Topic needs discussion with the larger Consul maintainers before committing to for a release label Nov 28, 2018
@banks
Copy link
Member

banks commented Nov 28, 2018

Scheduling this for attention soon - this issue/proposal is on our radar and will hopefully have cycles to consider it soon.

* Added better checks and error messages to get proper validation of value
* Use constants to avoid duplicating strings everywhere
types/noderenaming.go Show resolved Hide resolved
agent/config/builder.go Outdated Show resolved Hide resolved
agent/consul/state/catalog.go Outdated Show resolved Hide resolved
agent/consul/state/catalog.go Show resolved Hide resolved
@linydquantil
Copy link

upup

@aashitvyas
Copy link

I am curious to know when this fix is getting merged into the master. We are doing upgrade on Consul from 1.0.6 to 1.3.1 and encountering issue mentioned in #4741. If this fix will get merge soon, we can have 1.3.1 on our environment rather then upgrading to 1.2.2

Copy link
Member

@banks banks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pierresouchay!

This is another awesome contribution. I think overall having different policies is a good solution here. I have a few specific comments in the code about changes.

The biggest question though is about if it is safe to have this in config (spoiler alert, it's not really).

This is the first time that some state that is not governed by Raft is able to affect the decisions made inside the state machine! That sounds really dangerous!

For example if one of your servers accidentally had a different policy set to the others when it came up, it might reject a node registration that the other servers accepted and so the state would diverge and be corrupt!

In this one case the scope for divergence is somewhat limited and I think recoverable by changing config and reregistering nodes, but I'm extremely reluctant to go down that path. Thinking through all the transitive deps this could cause for anything later operation that might observe the different state on different replicas is a minefield.

The good news is there are other alternatives. The way we handle this for things like Connect CA bootstrap and Autopilot config is that the config values just bootstrap the cluster and then the actual source of truth for the current policy is actually in the raft state.

The bad new is that to do that needs quite a bit of extra work - a new state store table and all new raft operations to manage it, snapshot it, API endpoints to query and update it once it's bootstrapped etc.

Another (simpler and maybe better) option is that you keep it in config on servers and you add the effective policy as an argument to the raft operation for registering a node. This way the leader tells the cluster as part of the replicated operation what policy it is being submitted under so it is guaranteed that all replicas will do the same thing again. That saves a lot of extra work managing config in raft (no API needed to update) and makes it easier to switch config on the server - the current leader decides the policy. That way having different policies on different servers is still kinda bad because you might change behaviour when leadership changes but at least it doesn't threaten the integrity of the FSM.

What do you think?

}

// New is used to construct a new FSM with a blank state.
func New(gc *state.TombstoneGC, logOutput io.Writer) (*FSM, error) {
stateNew, err := state.NewStateStore(gc)
func New(gc *state.TombstoneGC, logOutput io.Writer, nodeRenamingPolicy types.NodeRenamingPolicy) (*FSM, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing this in right through the consul config, server, FSM, and state store constructor seems like the wrong approach for something that only affects one operation on the state store. Is there a reason we cant just make an option on EnsureNode that flips this and can be sett in the Register RPC direct from the agent config?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually... forget the code sprawl, this indicates a much bigger issue - making FSM sate depend on outside input that is not guaranteed to be the same on all replicas. See my review comment.

@@ -354,59 +354,124 @@ func (s *Store) EnsureNode(idx uint64, node *structs.Node) error {
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about if this was changed to EnsureNode(idx uint64, node *structs.Node, renamingPolicy types.NodeRenamingPolicy). I think of the top of my head there is only one call site that in the register RPC that would need to change then and it can lookup direct from agent config?

func (s *Store) ensureNodeTxn(tx *memdb.Txn, idx uint64, node *structs.Node) error {
func (s *Store) findExistingNodeRenameDeadNodes(tx *memdb.Txn, idx uint64, node *structs.Node) (*structs.Node, error) {
if node.ID == "" {
return nil, fmt.Errorf("Empty Node IDs is not supported for : %s", node.Node)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return nil, fmt.Errorf("Empty Node IDs is not supported for : %s", node.Node)
return nil, fmt.Errorf("Empty Node ID is not supported for dead rename policy: %s", node.Node)

return nil, fmt.Errorf("Cannot get status of node %s due to: %s", dupNode.Node, err)
}
if dupNodeCheck == nil {
return nil, fmt.Errorf("Cannot RenameNode since check %s not found for node %s", string(structs.SerfCheckID), dupNode.Node)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return nil, fmt.Errorf("Cannot RenameNode since check %s not found for node %s", string(structs.SerfCheckID), dupNode.Node)
return nil, fmt.Errorf("Cannot rename node since check %s not found for node %s", string(structs.SerfCheckID), dupNode.Node)

if dupNodeCheck == nil {
return nil, fmt.Errorf("Cannot RenameNode since check %s not found for node %s", string(structs.SerfCheckID), dupNode.Node)
}
existingDupNodeSerf := dupNodeCheck.(*structs.HealthCheck)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The possibility of this panicing makes me itchy, can we do the _, ok := check and just error if the type is wrong? Paranoid and defensive sure but panics are bad!

}
existingDupNodeSerf := dupNodeCheck.(*structs.HealthCheck)
if existingDupNodeSerf.Status != api.HealthCritical {
// This is ok, we allow to take the identity of that node
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems in the wrong place - it's saying "OK we can take identity" but it's in the failing branch...

existingDupNodeSerf := dupNodeCheck.(*structs.HealthCheck)
if existingDupNodeSerf.Status != api.HealthCritical {
// This is ok, we allow to take the identity of that node
return dupNode, fmt.Errorf("Cannot rename since node %s because check %s is '%s'", dupNode.Node, string(structs.SerfCheckID), existingDupNodeSerf.Status)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return dupNode, fmt.Errorf("Cannot rename since node %s because check %s is '%s'", dupNode.Node, string(structs.SerfCheckID), existingDupNodeSerf.Status)
return dupNode, fmt.Errorf("Cannot rename node because existing node named %s is still alive", dupNode.Node)


func (s *Store) findExistingNodeStrict(tx *memdb.Txn, idx uint64, node *structs.Node) (*structs.Node, error) {
if node.ID == "" {
return nil, fmt.Errorf("Empty Node IDs is not supported for : %s", node.Node)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return nil, fmt.Errorf("Empty Node IDs is not supported for : %s", node.Node)
return nil, fmt.Errorf("Empty Node ID is not supported with strict rename policy for: %s", node.Node)

@pierresouchay
Copy link
Contributor Author

@banks I did think a while ago regarding this and thought about something: what about applying this on each agent individually ?

Thus, a node could specify if it has to be legacy, strict or "cloud ready"... Meaning the initial onode to be replaced would store the catalog the policy.

It make sense for instance in places where Consul in used in both bare-metal and cloudy environments.

What do you think about that?

That's a bit problematic for wide policies (I mean, that does not depend upon a single node to take the decision), but for our case, I think it would work.

@banks
Copy link
Member

banks commented Dec 14, 2018

I’ve just remembered we discussed this last week and came to a conclusion that I forgot to write down.

I’ll need to think about this more and expand upon it but it boils down to this:

  • Serf already allows nodes to be renamed safely - it has a mechanism to ensure a node rename is accepted by the cluster and will prevent nodes from claiming a name when another node disputes that it presently has that name.
  • so in general if Serf allows the new node to join then it’s pretty safe to allow it to register and take over that name
  • so we should forget having multiple modes and just make the normal behaviour be for the server to allow the rename IF Serf allows the join - exactly how we do that I’ve not looked into yet.

Another reason to avoid multiple modes is that external services (using ESM or otherwise) don’t have UUIDs so neither strict or dead mode here would work for them and it’s not reasonable to call that “legacy”. We need to continue to support no UUID registrations for external nodes first class.

Will come back to this when I get a chance, just wanted to note this while I remember!

@curtbusch
Copy link

Any idea if this will fix the Catalog.Register" RPC failed to server xxx.xxx.xxx.xxx:8300: rpc error making call: failed inserting node: Error while renaming Node ID: <node-id>: Node name xxx.xxx.xxx is reserved by node <node-id> with name xxx.xxx.xxx error if you restore from a snapshot?

In our instance of consul if we restore from a snapshot whose node name is the same as the fresh consul node name we start seeing the error once we set an agent token. This is running consul 1.4 on Azure aks deployed using hashicorp's helm chart.

@banks
Copy link
Member

banks commented Jan 10, 2019

@cwillbusch yeah that's a similar issue that would be fixed when we get this right. The complication there is that the snapshot restore is essentially time-travelling back to when a different node was registered in the same way!

If you manually deregister the node you should be able to get it to work in the interim. My proposed solution above (not this PR proposal) would fix it automatically since Serf layer would know that the old node is no longer around and so allow the re-registration.

@curtbusch
Copy link

@banks Thanks for your clarification!

Could you let me know the proper way to manually deregister a node so that i can use this solution until your proposed solution is implemented?

@banks
Copy link
Member

banks commented Jan 17, 2019

Heh that is actually a good question with a slightly sad answer @cwillbusch (CLI for this #2981 if interested).

You should be able to do so via the API though: https://www.consul.io/api/catalog.html#deregister-entity.

E.g.:

curl -i -X PUT -d '{"Node":"c1"}' 'http://localhost:8500/v1/catalog/deregister'

Note that despite the docs, this didn't seem to work when i attempted to use the Node's ID rather than its name - I think those docs predate Node IDs being a separate UUID. But this works I tested locally - the new node immediately is able to sync on the next attempt even without a restart.

@Nmishin
Copy link
Contributor

Nmishin commented Feb 14, 2019

Hi @banks, seems I have a related question here:
previously I have a Consul and clients 0.8.4 version, after upgrade Consul server to the 1.3.1 version I see in logs a lot of errors something like that:
consul: RPC failed to server 10.10.9.5:8300: rpc error: rpc error making call: failed inserting node: Error while renaming Node ID: "e1d3e56d-9226-bf3a-571d-2a7e500c37dd": Node name dev-test-zcwf is reserved by node de084f26-6488-71a8-812f-e5f680ef4a17 with name dev-test-zcwf
As I understand dev-test-zcwf node was recreated with the same name, but a new id, what should I do with this?
I didn't restore Consul from a snapshot, only upgrade Consul with data saving.
Do I need set -disable-host-node-id to false for preventing randomly generated node id?
Thanks.

@banks
Copy link
Member

banks commented Feb 18, 2019

@Nmishin the answer is the same as above - assuming the old node actually doesn't exist, you either need to wait 72 hours or manually deregister the old node before you can add the new one with the curl command above.

Once we get the real fix in highlighted above, this problem should be able to solve itself automatically.

@kevstigneev
Copy link

This is the first time that some state that is not governed by Raft is able to affect the decisions made inside the state machine! That sounds really dangerous!

Could not agree more.

Let's take it logically.

A node is uniquely identified in the cluster by its node_id and each node can have just one node_id. If that's true, one of the following must be true as well:

  • several nodes can share the same node_name;
  • node_id and node_name are effectively equal;
  • a node can have several node_name's.

A node in Consul cluster is a data object. It is not the same as a Consul agent instance. Consul node and agent instance have disjoint lifecycles. node_id is meant to identify a data object, node name - an agent instance. If node_name is globally unique (space-time wise), the second option above (node_name ~ node_id) is fine.

However operators naturally want to use DNS names as node_name. DNS requires only spacial uniqueness. It does not formally require or guarantee that a name is not reused over time. It inhibits the second option.

From the remaining 2 options (id:name as N:1 and 1:N) the last one (multiple names per cluster) does not make sense. It means that a Consul agent can be represented by several Raft nodes.

It leaves just one option - several cluster nodes (data objects) can share the same node_name.

If a specific operation environment imposes extra restrictions, they should be addressed on top of the above. Not in the core.

@banks
Copy link
Member

banks commented Feb 20, 2019

@kevstigneev thanks for the input. I think the overall decision here is clear so I'll close this.

For the record though:

node_id is meant to identify a data object, node name - an agent instance.

This isn't quite right - node IDs are unique identifiers for the agent too - the reason we added them as well as node names is that people like to rename their nodes from time to time so we wanted a separate stable ID. The issue is that it's impossible to make a truely stable ID - we try using hardware identifiers or failing that storing it on disk but then people find creative reasons that those change e.g. they rebuild an instance but keep same host name so now it has a new ID but same name.

That's the crux of the problem. Pierre's proposal here isn't bad or wrong, it's just that we've realised we already have a mechanism (Serf) that can automatically resolve this conflict correctly without the complexity of explicit new behaviour options in the state machine or elsewhere.

We'll revisit this idea though if Serf can't solve for all cases as we expect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-discussion Topic needs discussion with the larger Consul maintainers before committing to for a release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants