-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added multiple policies for renaming nodes #5008
Added multiple policies for renaming nodes #5008
Conversation
b6f7033
to
3e094fd
Compare
3e094fd
to
a10683e
Compare
@mkeeler @banks what do you think of this proposal? (if you remember our conversations in #3983 and #4399, it is a hot topic) I think it is a good alternative to the current status quo, and will allow to remove progressively the legacy and hard to debug method of current node registration. It will also help people using Consul in Cloud environments as described in #4741 while allowing people like us with bare-metal nodes and fixed nodeids to have clearer policies regarding nodes renaming. Kind regards |
Scheduling this for attention soon - this issue/proposal is on our radar and will hopefully have cycles to consider it soon. |
* Added better checks and error messages to get proper validation of value * Use constants to avoid duplicating strings everywhere
1b24667
to
ce8ee44
Compare
upup |
I am curious to know when this fix is getting merged into the master. We are doing upgrade on Consul from 1.0.6 to 1.3.1 and encountering issue mentioned in #4741. If this fix will get merge soon, we can have 1.3.1 on our environment rather then upgrading to 1.2.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pierresouchay!
This is another awesome contribution. I think overall having different policies is a good solution here. I have a few specific comments in the code about changes.
The biggest question though is about if it is safe to have this in config (spoiler alert, it's not really).
This is the first time that some state that is not governed by Raft is able to affect the decisions made inside the state machine! That sounds really dangerous!
For example if one of your servers accidentally had a different policy set to the others when it came up, it might reject a node registration that the other servers accepted and so the state would diverge and be corrupt!
In this one case the scope for divergence is somewhat limited and I think recoverable by changing config and reregistering nodes, but I'm extremely reluctant to go down that path. Thinking through all the transitive deps this could cause for anything later operation that might observe the different state on different replicas is a minefield.
The good news is there are other alternatives. The way we handle this for things like Connect CA bootstrap and Autopilot config is that the config values just bootstrap the cluster and then the actual source of truth for the current policy is actually in the raft state.
The bad new is that to do that needs quite a bit of extra work - a new state store table and all new raft operations to manage it, snapshot it, API endpoints to query and update it once it's bootstrapped etc.
Another (simpler and maybe better) option is that you keep it in config on servers and you add the effective policy as an argument to the raft operation for registering a node. This way the leader tells the cluster as part of the replicated operation what policy it is being submitted under so it is guaranteed that all replicas will do the same thing again. That saves a lot of extra work managing config in raft (no API needed to update) and makes it easier to switch config on the server - the current leader decides the policy. That way having different policies on different servers is still kinda bad because you might change behaviour when leadership changes but at least it doesn't threaten the integrity of the FSM.
What do you think?
} | ||
|
||
// New is used to construct a new FSM with a blank state. | ||
func New(gc *state.TombstoneGC, logOutput io.Writer) (*FSM, error) { | ||
stateNew, err := state.NewStateStore(gc) | ||
func New(gc *state.TombstoneGC, logOutput io.Writer, nodeRenamingPolicy types.NodeRenamingPolicy) (*FSM, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing this in right through the consul config, server, FSM, and state store constructor seems like the wrong approach for something that only affects one operation on the state store. Is there a reason we cant just make an option on EnsureNode
that flips this and can be sett in the Register RPC direct from the agent config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually... forget the code sprawl, this indicates a much bigger issue - making FSM sate depend on outside input that is not guaranteed to be the same on all replicas. See my review comment.
@@ -354,59 +354,124 @@ func (s *Store) EnsureNode(idx uint64, node *structs.Node) error { | |||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about if this was changed to EnsureNode(idx uint64, node *structs.Node, renamingPolicy types.NodeRenamingPolicy)
. I think of the top of my head there is only one call site that in the register RPC that would need to change then and it can lookup direct from agent config?
func (s *Store) ensureNodeTxn(tx *memdb.Txn, idx uint64, node *structs.Node) error { | ||
func (s *Store) findExistingNodeRenameDeadNodes(tx *memdb.Txn, idx uint64, node *structs.Node) (*structs.Node, error) { | ||
if node.ID == "" { | ||
return nil, fmt.Errorf("Empty Node IDs is not supported for : %s", node.Node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return nil, fmt.Errorf("Empty Node IDs is not supported for : %s", node.Node) | |
return nil, fmt.Errorf("Empty Node ID is not supported for dead rename policy: %s", node.Node) |
return nil, fmt.Errorf("Cannot get status of node %s due to: %s", dupNode.Node, err) | ||
} | ||
if dupNodeCheck == nil { | ||
return nil, fmt.Errorf("Cannot RenameNode since check %s not found for node %s", string(structs.SerfCheckID), dupNode.Node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return nil, fmt.Errorf("Cannot RenameNode since check %s not found for node %s", string(structs.SerfCheckID), dupNode.Node) | |
return nil, fmt.Errorf("Cannot rename node since check %s not found for node %s", string(structs.SerfCheckID), dupNode.Node) |
if dupNodeCheck == nil { | ||
return nil, fmt.Errorf("Cannot RenameNode since check %s not found for node %s", string(structs.SerfCheckID), dupNode.Node) | ||
} | ||
existingDupNodeSerf := dupNodeCheck.(*structs.HealthCheck) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The possibility of this panicing makes me itchy, can we do the _, ok :=
check and just error if the type is wrong? Paranoid and defensive sure but panics are bad!
} | ||
existingDupNodeSerf := dupNodeCheck.(*structs.HealthCheck) | ||
if existingDupNodeSerf.Status != api.HealthCritical { | ||
// This is ok, we allow to take the identity of that node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems in the wrong place - it's saying "OK we can take identity" but it's in the failing branch...
existingDupNodeSerf := dupNodeCheck.(*structs.HealthCheck) | ||
if existingDupNodeSerf.Status != api.HealthCritical { | ||
// This is ok, we allow to take the identity of that node | ||
return dupNode, fmt.Errorf("Cannot rename since node %s because check %s is '%s'", dupNode.Node, string(structs.SerfCheckID), existingDupNodeSerf.Status) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return dupNode, fmt.Errorf("Cannot rename since node %s because check %s is '%s'", dupNode.Node, string(structs.SerfCheckID), existingDupNodeSerf.Status) | |
return dupNode, fmt.Errorf("Cannot rename node because existing node named %s is still alive", dupNode.Node) |
|
||
func (s *Store) findExistingNodeStrict(tx *memdb.Txn, idx uint64, node *structs.Node) (*structs.Node, error) { | ||
if node.ID == "" { | ||
return nil, fmt.Errorf("Empty Node IDs is not supported for : %s", node.Node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return nil, fmt.Errorf("Empty Node IDs is not supported for : %s", node.Node) | |
return nil, fmt.Errorf("Empty Node ID is not supported with strict rename policy for: %s", node.Node) |
@banks I did think a while ago regarding this and thought about something: what about applying this on each agent individually ? Thus, a node could specify if it has to be legacy, strict or "cloud ready"... Meaning the initial onode to be replaced would store the catalog the policy. It make sense for instance in places where Consul in used in both bare-metal and cloudy environments. What do you think about that? That's a bit problematic for wide policies (I mean, that does not depend upon a single node to take the decision), but for our case, I think it would work. |
I’ve just remembered we discussed this last week and came to a conclusion that I forgot to write down. I’ll need to think about this more and expand upon it but it boils down to this:
Another reason to avoid multiple modes is that external services (using ESM or otherwise) don’t have UUIDs so neither strict or dead mode here would work for them and it’s not reasonable to call that “legacy”. We need to continue to support no UUID registrations for external nodes first class. Will come back to this when I get a chance, just wanted to note this while I remember! |
Any idea if this will fix the In our instance of consul if we restore from a snapshot whose node name is the same as the fresh consul node name we start seeing the error once we set an agent token. This is running consul 1.4 on Azure aks deployed using hashicorp's helm chart. |
@cwillbusch yeah that's a similar issue that would be fixed when we get this right. The complication there is that the snapshot restore is essentially time-travelling back to when a different node was registered in the same way! If you manually deregister the node you should be able to get it to work in the interim. My proposed solution above (not this PR proposal) would fix it automatically since Serf layer would know that the old node is no longer around and so allow the re-registration. |
@banks Thanks for your clarification! Could you let me know the proper way to manually deregister a node so that i can use this solution until your proposed solution is implemented? |
Heh that is actually a good question with a slightly sad answer @cwillbusch (CLI for this #2981 if interested). You should be able to do so via the API though: https://www.consul.io/api/catalog.html#deregister-entity. E.g.:
Note that despite the docs, this didn't seem to work when i attempted to use the Node's ID rather than its name - I think those docs predate Node IDs being a separate UUID. But this works I tested locally - the new node immediately is able to sync on the next attempt even without a restart. |
Hi @banks, seems I have a related question here: |
@Nmishin the answer is the same as above - assuming the old node actually doesn't exist, you either need to wait 72 hours or manually deregister the old node before you can add the new one with the curl command above. Once we get the real fix in highlighted above, this problem should be able to solve itself automatically. |
Could not agree more. Let's take it logically. A node is uniquely identified in the cluster by its node_id and each node can have just one node_id. If that's true, one of the following must be true as well:
A node in Consul cluster is a data object. It is not the same as a Consul agent instance. Consul node and agent instance have disjoint lifecycles. node_id is meant to identify a data object, node name - an agent instance. If node_name is globally unique (space-time wise), the second option above (node_name ~ node_id) is fine. However operators naturally want to use DNS names as node_name. DNS requires only spacial uniqueness. It does not formally require or guarantee that a name is not reused over time. It inhibits the second option. From the remaining 2 options (id:name as N:1 and 1:N) the last one (multiple names per cluster) does not make sense. It means that a Consul agent can be represented by several Raft nodes. It leaves just one option - several cluster nodes (data objects) can share the same node_name. If a specific operation environment imposes extra restrictions, they should be addressed on top of the above. Not in the core. |
@kevstigneev thanks for the input. I think the overall decision here is clear so I'll close this. For the record though:
This isn't quite right - node IDs are unique identifiers for the agent too - the reason we added them as well as node names is that people like to rename their nodes from time to time so we wanted a separate stable ID. The issue is that it's impossible to make a truely stable ID - we try using hardware identifiers or failing that storing it on disk but then people find creative reasons that those change e.g. they rebuild an instance but keep same host name so now it has a new ID but same name. That's the crux of the problem. Pierre's proposal here isn't bad or wrong, it's just that we've realised we already have a mechanism (Serf) that can automatically resolve this conflict correctly without the complexity of explicit new behaviour options in the state machine or elsewhere. We'll revisit this idea though if Serf can't solve for all cases as we expect. |
Addressing #4414 is hard as many unit tests do rely on having empty nodeIDs.
It seems there are several usages of nodeId:
This PR allows to address those 3 use-cases by adding a new options
node_renaming_policy=(legacy|dead|strict)
legacy
(the default) will keep the existing behaviordead
: do not allow nodes without IDs, but allow to replace a dead node with another one with the same name, but different ID (this is DONE by checking that serfHealth is critical). Stilll allow renaming nodesstrict
: do not allow empty nodeID, allow to rename nodes with the same ID, do now allow to steal a node Name as soon as the node is within the catalog.