Do not log cluster service errors at after joining a master #19705

bleskes · 2016-07-30T20:31:52Z

During our master elections, nodes "vote" for a master being issuing a join request to it. Since this is done in an async fashion, joins may arrive before the master itself has realized it had won the election. Therefore we start accumulating node joins on every node at election start (we don't know the result yet). When the election finish nodes that did not become the master (i.e., joined another node which won the election) need to potentially process and fail any incoming join request they may have received during the election. This is currently achieved by always issuing a cluster state update task that is doomed to fail, even if no pending joins are actually there. That aspect results in confusing (debug) log messages, making it seems like something is wrong. For example (note that NotMasterException)

[2016-07-30 22:25:53,040][DEBUG][cluster.service          ] [node_t1] processing [zen-disco-process-pending-joins [{node_t0}{4SqBTyYNQ82J9c75Cs7jtg}{kutaNSYbTZCSybvqczgWCA}{127.0.0.1}{127.0.0.1:9400} elected]]: execute
[2016-07-30 22:25:53,041][DEBUG][transport                ] [node_t1] connected to node [{node_t0}{4SqBTyYNQ82J9c75Cs7jtg}{kutaNSYbTZCSybvqczgWCA}{127.0.0.1}{127.0.0.1:9400}]
[2016-07-30 22:25:53,045][DEBUG][cluster.service          ] [node_t1] cluster state update task [zen-disco-process-pending-joins [{node_t0}{4SqBTyYNQ82J9c75Cs7jtg}{kutaNSYbTZCSybvqczgWCA}{127.0.0.1}{127.0.0.1:9400} elected]] failed
NotMasterException[Node [{node_t1}{eAQts270TiGFpoCDE-0PQQ}{or5bsv2ET220su78DLJk5g}{127.0.0.1}{127.0.0.1:9401}] not master for join request]
[2016-07-30 22:25:53,048][DEBUG][cluster.service          ] [node_t1] processing [zen-disco-process-pending-joins [{node_t0}{4SqBTyYNQ82J9c75Cs7jtg}{kutaNSYbTZCSybvqczgWCA}{127.0.0.1}{127.0.0.1:9400} elected]]: took [7ms] no change in cluster_state

This PR cleans up the logic a bit to only use failure where there are actual joins that are failed. The result is cleaner logs as well:

[2016-07-30 22:23:12,880][DEBUG][cluster.service          ] [node_t1] processing [zen-disco-election-stop [{node_t0}{jMR5HCpOQnOM4pGeFkUjng}{B5WIZQAdQk2cWbjGZ21mvQ}{127.0.0.1}{127.0.0.1:9400} elected]]: execute
[2016-07-30 22:23:12,881][DEBUG][cluster.service          ] [node_t1] processing [zen-disco-election-stop [{node_t0}{jMR5HCpOQnOM4pGeFkUjng}{B5WIZQAdQk2cWbjGZ21mvQ}{127.0.0.1}{127.0.0.1:9400} elected]]: took [0s] no change in cluster_state
[2016-07-30 22:23:12,881][DEBUG][transport                ] [node_t1] connected to node [{node_t0}{jMR5HCpOQnOM4pGeFkUjng}{B5WIZQAdQk2cWbjGZ21mvQ}{127.0.0.1}{127.0.0.1:9400}]

bleskes · 2016-07-30T20:32:21Z

@jasontedor welcome back. Do you mind taking a look?

jasontedor · 2016-08-01T10:13:34Z

Nice. LGTM.

make election stop not be a failure

cdd2e8f

bleskes added >enhancement :Distributed Coordination/Discovery-Plugins Anything related to our integration plugins with EC2, GCP and Azure v5.0.0-beta1 labels Jul 30, 2016

jasontedor self-assigned this Jul 31, 2016

bleskes merged commit 7c6527e into elastic:master Aug 1, 2016

bleskes deleted the node_join_logging branch August 1, 2016 11:08

clintongormley added v5.0.0-alpha5 and removed v5.0.0-beta1 labels Aug 4, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Do not log cluster service errors at after joining a master #19705

Do not log cluster service errors at after joining a master #19705

Uh oh!

bleskes commented Jul 30, 2016

Uh oh!

bleskes commented Jul 30, 2016

Uh oh!

jasontedor commented Aug 1, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Do not log cluster service errors at after joining a master #19705

Do not log cluster service errors at after joining a master #19705

Uh oh!

Conversation

bleskes commented Jul 30, 2016

Uh oh!

bleskes commented Jul 30, 2016

Uh oh!

jasontedor commented Aug 1, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants