-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: add inflight snapshot metrics #11009
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks useful! A couple minor comments
etcdserver/metrics.go
Outdated
applySnapshotInflights = prometheus.NewGauge(prometheus.GaugeOpts{ | ||
Namespace: "etcd", | ||
Subsystem: "server", | ||
Name: "snapshot_apply_inflights_total", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Since this is not a network metric (and subsystem=server), maybe instead of "inflight" name it something like "snapshot_apply_in_progress_total" ?
etcdserver/api/rafthttp/http.go
Outdated
@@ -287,6 +289,7 @@ func (h *snapshotHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) { | |||
plog.Error(msg) | |||
} | |||
http.Error(w, msg, http.StatusInternalServerError) | |||
snapshotReceiveInflights.WithLabelValues(from).Dec() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Would be great if we could somehow structure the code blocks we only need to make one call to Dec()
, ideally in a defer. The code looks correct, but I worry we'll end up miscounting at some point...
@jpbetz All addressed. Thanks! |
@@ -90,6 +90,7 @@ func (s *snapshotSender) send(merged snap.Message) { | |||
plog.Infof("start to send database snapshot [index: %d, to %s]...", m.Snapshot.Metadata.Index, types.ID(m.To)) | |||
} | |||
|
|||
snapshotSendInflights.WithLabelValues(to).Inc() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defer for Dec()
here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just fixed :)
LGTM after other defer Thanks! |
…cd_network_snapshot_receive_inflights_total" Useful for deciding when to terminate the unhealthy follower. If the follower is receiving a leader snapshot, operator may wait. Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
Agree with Joe. Thanks! |
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
LGTM! |
LGTM thanks @gyuho! |
*: add inflight snapshot metrics
Could we backport this to 3.2 and 3.3? |
@wenjiaswe Sure! |
Currently, there's no way to find out if current follower is receiving a leader snapshot. If snapshot receive is inflight, the operator may wait until it's done, rather than marking the follower as unhealthy. This is useful for a large cluster, where snapshot sends takes several minutes (e.g. 4 GB).
/cc @xiang90 @jpbetz @jingyih