-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus metric to show whether a server is a leader or follower #13169
Comments
Hi, @dpw , thanks for reporting and investigating this issue. You analysis totally makes sense; will work on the improvement. |
4 tasks
hc-github-team-consul-core
added a commit
that referenced
this issue
Jun 3, 2022
hc-github-team-consul-core
added a commit
that referenced
this issue
Jun 3, 2022
4 tasks
hc-github-team-consul-core
added a commit
that referenced
this issue
Jun 3, 2022
hc-github-team-consul-core
added a commit
that referenced
this issue
Jun 3, 2022
4 tasks
hc-github-team-consul-core
added a commit
that referenced
this issue
Jun 3, 2022
hc-github-team-consul-core
added a commit
that referenced
this issue
Jun 3, 2022
4 tasks
hc-github-team-consul-core
added a commit
that referenced
this issue
Jun 7, 2022
hc-github-team-consul-core
added a commit
that referenced
this issue
Jun 7, 2022
4 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Feature Description
There should be a consul server prometheus metric showing whether a server considers itself to be a leader, follower (or neither, in the case of a candidate). This should be possible for the current time (i.e. as of the last prometheus scrape) or for an arbitrary point in the past. Currently there is no straightforward way to determine this.
Furthermore, there should be a metric dedicated to this purpose and stable in future versions of consul (rather than being a metric for some other purpose that indicates the leader as a side effect, and so is liable to change).
Use Case(s)
For normal operation of a consul server cluster, there should be exactly one leader server, and all other servers should be followers. It should be possible to monitor that these conditions are satisfied, and alert if not, by means of simple prometheus query expressions.
Non-solutions
At first glance, it looks like the
consul_raft_state_*
metrics offer this. But those are counters that increment upon entry to the relevant state. So their values at a point in time do not show the leader and followers. For example, if a server reports a non-zero value ofconsul_raft_state_leader
that means it became leader at some point, but it does not tell you that it is the leader now. (These counters do not even reliably tell the outcome of an election, as multiple elections may occur within a single prometheus scrape interval.)In the past, there were gauge metrics that suggested the leader by their presence, for instance
consul_raft_apply
andconsul_autopilot_healthy
. But because those were only updated on the leader, when a server ceased to be leader they would contain stale values for a time controlled by thetelemetry.prometheus_retention_period
config setting. Furthermore, subsequent commits mean that those metrics no longer indicate the leader (#9198 exposedconsul_raft_apply
on every node; #12617 exposedconsul_autopilot_healthy
on every server).While there are counter metrics that only increase on the leader, using them to reliably determine the leader requires a very cumbersome prometheus query expression (especially if the case of a standalone consul server is handled).
The text was updated successfully, but these errors were encountered: