Skip to content

Commit

Permalink
storage: emit health alert when >100 Raft snapshots are queued
Browse files Browse the repository at this point in the history
This is a dangerous condition. Adding this to the health checker has the
additional benefit of logging it during the nightly restore/import
tests, which can in turn help diagnose whether a particular run is
affected by cockroachdb#31409.

Release note: None
  • Loading branch information
tbg committed Nov 30, 2018
1 parent ef0423b commit dc0bb66
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions pkg/server/status/health_check.go
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,10 @@ var trackedMetrics = map[string]threshold{
"queue.raftsnapshot.process.failure": counterZero,
"queue.tsmaintenance.process.failure": counterZero,
"queue.consistency.process.failure": counterZero,

// When there are more than 100 pending items in the Raft snapshot queue,
// this is certainly worth pointing out.
"queue.raftsnapshot.pending": {gauge: true, min: 100},
}

type metricsMap map[roachpb.StoreID]map[string]float64
Expand Down

0 comments on commit dc0bb66

Please sign in to comment.