Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to rename nodes with IDs, will fix #3974 and #4413 #4415

Merged
merged 11 commits into from
Aug 10, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 65 additions & 12 deletions agent/consul/state/catalog.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import (
"github.com/hashicorp/consul/api"
"github.com/hashicorp/consul/types"
"github.com/hashicorp/go-memdb"
uuid "github.com/hashicorp/go-uuid"
)

const (
Expand Down Expand Up @@ -350,6 +351,25 @@ func (s *Store) EnsureNode(idx uint64, node *structs.Node) error {
return nil
}

// ensureNoNodeWithSimilarNameTxn checks that no other node has conflict in its name
// If allowClashWithoutID then, getting a conflict on another node without ID will be allowed
func (s *Store) ensureNoNodeWithSimilarNameTxn(tx *memdb.Txn, node *structs.Node, allowClashWithoutID bool) error {
// Retrieve all of the nodes
enodes, err := tx.Get("nodes", "id")
if err != nil {
return fmt.Errorf("Cannot lookup all nodes: %s", err)
}
for nodeIt := enodes.Next(); nodeIt != nil; nodeIt = enodes.Next() {
enode := nodeIt.(*structs.Node)
if strings.EqualFold(node.Node, enode.Node) && node.ID != enode.ID {
if !(enode.ID == "" && allowClashWithoutID) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this cover the case where the new node (node.ID == "") but clashing node isn't? That is specifically what broke tests before.

I'll read on and may well find a reason why that can't happen here but just noting. Even if that is a true invariant might be worth commenting because we want to allow either the existing or the new node to have blank ID without changing behaviour.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function does not, but basically yes since this function is called ONLY when node.ID != ""

In my next and (hopefuly) last patch, I did not change anything but added tests to this function.

To summarize

When ID is provided:

  • if the node already exists, you rename the node if no conflicts with another node
  • if node does not exists, check for conflicts, but ONLY for nodes having ID (thus we allow to "upgrade" a node without ID and add an ID)

When ID is not provided:

  • free lunch, you can "downgrade" a node by removing its ID - BUT ONLY if you have the exact same name as an existing node without ID (Kept for your tests and compatibility reasons)
  • you can create a node of course

So, it means this PR is fixing #4413 (not being able to have two different agents with with that differs only by their case) ONLY in clusters where everybody is playing nice and has an ID (which is your case in our clusters :-)

As long as you have agents that register without any ID, you are in danger, you may have node2 and NoDe2. But we keep all of this for the ascending compatibility with old clusters as well as your tests

return fmt.Errorf("Node name %s is reserved by node %s with name %s", node.Node, enode.ID, enode.Node)
}
}
}
return nil
}

// ensureNodeTxn is the inner function called to actually create a node
// registration or modify an existing one in the state store. It allows
// passing in a memdb transaction so it may be part of a larger txn.
Expand All @@ -358,28 +378,50 @@ func (s *Store) ensureNodeTxn(tx *memdb.Txn, idx uint64, node *structs.Node) err
// name is the same.
var n *structs.Node
if node.ID != "" {
existing, err := tx.First("nodes", "uuid", string(node.ID))
existing, err := getNodeIDTxn(tx, node.ID)
if err != nil {
return fmt.Errorf("node lookup failed: %s", err)
}
if existing != nil {
n = existing.(*structs.Node)
n = existing
if n.Node != node.Node {
return fmt.Errorf("node ID %q for node %q aliases existing node %q",
node.ID, node.Node, n.Node)
// Lets first get all nodes and check whether name do match, we do not allow clash on nodes without ID
dupNameError := s.ensureNoNodeWithSimilarNameTxn(tx, node, false)
if dupNameError != nil {
return fmt.Errorf("Error while renaming Node ID: %q: %s", node.ID, dupNameError)
}
// We are actually renaming a node, remove its reference first
err := s.deleteNodeTxn(tx, idx, n.Node)
if err != nil {
return fmt.Errorf("Error while renaming Node ID: %q from %s to %s",
node.ID, n.Node, node.Node)
}
}
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The else statement here is one of the problematic ones. I would recommend replacing with this statement.

}
// TODO: else statement missing to ensure name is not already taken by another node
// If added right now it could break existing behavior which allows for adding
// a UUID to a node that doesn't have one when the name is the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkeeler I think the else statement is now SAFE since s.ensureNoNodeWithSimilarNameTxn will not match any node without an ID. It means it will break ONLY if: a node having an ID clashes (case insensitive) with another node having an ID.

In all other cases, the behaviour is unchanged as shown in tests deprecatedEnsureNodeWithoutIDCanRegister of agent/consul/state/catalog_test.go.

What works:

When request has an ID:

  • renaming a node when having an ID, without colliding with the name of another node (case insensitive)
  • creating a node with an ID for which the name is not taken by anybody (case insensitive)

When request has not ID (more permissive regarding case insensitivity)

  • Create a new node without ID with name not colliding
  • Remove a node ID of a node with an ID if name EXACTLY (case sensitive) matches
  • Create a new node without ID with name colliding with case insensitive differences

What does not work anymore due to this change:

  • Create a node with an ID, but this name collides with another one (case insensitive) having an ID (works if other node DOES not have any ID)

Previously, given the following configuration

id42 - node1

Registering same name with different ID is blocked
id43 - node2 => previously would 'steal' node2 to id42 ==> now blocked
id43 - NoDe2 => previously would create a second node (id42: node2 ; id43: NoDe2) => the big crash I had described here #4413 => now the registration is blocked and id43 won't be able to join

Conclusion

  • A request with an ID can "steal" a node without any ID, but cannot steal a node with an ID if ID is different
  • A request without any ID can do whatever it wants

So, basically, the ONLY new possible issue we have is:

  • if you change manually the ID of a node, this node won't be able to join the cluster
  • if 2 nodes have different IDs and case insensitive differences, only ONE can join, which is good new for DNS as well as it avoid the DB corruption I had.

What do you think of this ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand now.

We will error out if both if an existing node with a uuid != new uuid where there is a name match but we will still allow adding a uuid to a node that doesn't have one. You just moved the logic to only enforce it when they both have uuids into that function.

// We allow to "steal" another node name that would have no ID
// It basically means that we allow upgrading a node without ID and add the ID
dupNameError := s.ensureNoNodeWithSimilarNameTxn(tx, node, true)
if dupNameError != nil {
return fmt.Errorf("Error while renaming Node ID: %q: %s", node.ID, dupNameError)
}
}
}
// TODO: else Node.ID == "" should be forbidden in future Consul releases
// See https://github.com/hashicorp/consul/pull/3983 for context

// Check for an existing node by name to support nodes with no IDs.
if n == nil {
existing, err := tx.First("nodes", "id", node.Node)
if err != nil {
return fmt.Errorf("node name lookup failed: %s", err)
}

if existing != nil {
n = existing.(*structs.Node)
}
// WARNING, for compatibility reasons with tests, we do not check
// for case insensitive matches, which may lead to DB corruption
// See https://github.com/hashicorp/consul/pull/3983 for context
}

// Get the indexes.
Expand Down Expand Up @@ -421,6 +463,23 @@ func (s *Store) GetNode(id string) (uint64, *structs.Node, error) {
return idx, nil, nil
}

func getNodeIDTxn(tx *memdb.Txn, id types.NodeID) (*structs.Node, error) {
strnode := string(id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The uuid parsing should be done with the ParseUUID function from github.com/hashicorp/go-uuid. An example of this is in the Catalog Register RPC endpoint: https://github.com/hashicorp/consul/blob/master/agent/consul/catalog_endpoint.go#L37

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

uuidValue, err := uuid.ParseUUID(strnode)
if err != nil {
return nil, fmt.Errorf("node lookup by ID failed, wrong UUID: %v for '%s'", err, strnode)
}

node, err := tx.First("nodes", "uuid", uuidValue)
if err != nil {
return nil, fmt.Errorf("node lookup by ID failed: %s", err)
}
if node != nil {
return node.(*structs.Node), nil
}
return nil, nil
}

// GetNodeID is used to retrieve a node registration by node ID.
func (s *Store) GetNodeID(id types.NodeID) (uint64, *structs.Node, error) {
tx := s.db.Txn(false)
Expand All @@ -430,14 +489,8 @@ func (s *Store) GetNodeID(id types.NodeID) (uint64, *structs.Node, error) {
idx := maxIndexTxn(tx, "nodes")

// Retrieve the node from the state store
node, err := tx.First("nodes", "uuid", string(id))
if err != nil {
return 0, nil, fmt.Errorf("node lookup failed: %s", err)
}
if node != nil {
return idx, node.(*structs.Node), nil
}
return idx, nil, nil
node, err := getNodeIDTxn(tx, id)
return idx, node, err
}

// Nodes is used to return all of the known nodes.
Expand Down
Loading