Makes server manager shift away from failed servers from Serf events. #3864
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Because this code was doing pointer equality checks, it would work for
the case of a failed attempted RPC because the objects are from the
manager itself:
https://github.com/hashicorp/consul/blob/v1.0.3/agent/consul/rpc.go#L283-L302
But the pointer check would always fail for events coming in from the
Serf path because the server object is newly-created:
https://github.com/hashicorp/consul/blob/v1.0.3/agent/router/serf_adapter.go#L14-L40
This means that we didn't proactively shift RPC traffic away from a
failed server, we'd have to wait for an RPC to fail, which exposes
the error to the calling client.
By switching over to a name check vs. a pointer check we get the correct
behavior. We added a DEBUG log as well to help observe this behavior during
integrated testing.
Related to #3863 since the fix here needed the same logic duplicated, owing
to the complicated atomic stuff.
/cc @dadgar for a heads up in case this also affects Nomad.