Recreating server nodes causes client nodes to flap #845

discordianfish · 2015-04-07T11:47:06Z

Hi,

not sure if this is a design issue or a simple bug:

If you create a (in my case three node) cluster with a bunch of client nodes and remove and recreate all server nodes, basically starting a clean cluster, all clients start flapping like this:

    2015/04/07 01:27:14 [INFO] serf: EventMemberJoin: ip-10-1-11-204 10.1.11.204
    2015/04/07 01:27:14 [INFO] serf: EventMemberJoin: ip-10-1-42-210 10.1.42.210
    2015/04/07 01:27:17 [INFO] memberlist: Suspect ip-10-1-31-116 has failed, no acks received
    2015/04/07 01:27:18 [INFO] memberlist: Marking ip-10-1-31-116 as failed, suspect timeout reached
    2015/04/07 01:27:18 [INFO] serf: EventMemberFailed: ip-10-1-31-116 10.1.31.116
    2015/04/07 01:27:19 [INFO] memberlist: Suspect ip-10-1-42-210 has failed, no acks received
    2015/04/07 01:27:20 [INFO] serf: EventMemberFailed: ip-10-1-42-210 10.1.42.210
    2015/04/07 01:27:21 [INFO] memberlist: Marking ip-10-1-11-204 as failed, suspect timeout reached
    2015/04/07 01:27:21 [INFO] serf: EventMemberFailed: ip-10-1-11-204 10.1.11.204
    2015/04/07 01:27:22 [INFO] memberlist: Suspect ip-10-1-11-204 has failed, no acks received
    2015/04/07 01:27:24 [INFO] serf: EventMemberJoin: ip-10-1-42-211 10.1.42.211
    2015/04/07 01:27:24 [INFO] serf: EventMemberJoin: ip-10-1-11-204 10.1.11.204
    2015/04/07 01:27:24 [INFO] serf: EventMemberJoin: ip-10-1-31-116 10.1.31.116
    2015/04/07 01:27:27 [INFO] serf: EventMemberFailed: ip-10-1-31-116 10.1.31.116
    2015/04/07 01:27:28 [INFO] serf: EventMemberJoin: ip-10-1-31-116 10.1.31.116
    2015/04/07 01:27:30 [INFO] memberlist: Suspect ip-10-1-11-204 has failed, no acks received
    2015/04/07 01:27:32 [INFO] memberlist: Marking ip-10-1-11-204 as failed, suspect timeout reached
    2015/04/07 01:27:32 [INFO] serf: EventMemberFailed: ip-10-1-11-204 10.1.11.204

I need to recreate the clients to make them properly reconnect. I'm not sure if this is required by design but I assumed the client nodes are rather dumb and just connect to the server nodes and use their state, so no way the local state could somehow conflict with the server state. If this isn't the case, is there some more specific documentation around such operational concerns?

The text was updated successfully, but these errors were encountered:

ryanuber · 2015-04-08T20:40:51Z

Consul clients do carry some state locally, which includes information about the cluster as well as the state of local services and checks. The logs you shared above are the gossip layer detecting failures on its peers. Without knowing which nodes were clients and which were servers, there's not much else I can derive from them, but if the failed nodes were the servers you stopped then this would be expected.

The clients should reconnect, though. Are there any differences on the new servers from the old (IP's, hostnames, firewalls, etc)? You might hit #457, since the Raft layer currently does not gracefully handle IP address changes. Can you share your configuration file(s)?

You will run into #839 with the current 0.5 release, but 0.5.1 will fix this. Basically the clients will not re-sync their services and checks to the global catalog.

discordianfish · 2015-04-08T22:50:51Z

In my case the server IPs changed, I expected consul to resolve the provided server address again but it didn't. But #839 sounds like it will fix that issue, although not sure if I still need to get rid of the clients local state about cluster members. Anyway, this can considered a dup of #839, so will close it.

discordianfish closed this as completed Apr 8, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recreating server nodes causes client nodes to flap #845

Recreating server nodes causes client nodes to flap #845

discordianfish commented Apr 7, 2015

ryanuber commented Apr 8, 2015

discordianfish commented Apr 8, 2015

Recreating server nodes causes client nodes to flap #845

Recreating server nodes causes client nodes to flap #845

Comments

discordianfish commented Apr 7, 2015

ryanuber commented Apr 8, 2015

discordianfish commented Apr 8, 2015