Skip to content
This repository was archived by the owner on Aug 23, 2023. It is now read-only.

Commit 8ddbfd2

Browse files
authored
Merge pull request #1186 from grafana/document-startup-produce
document startup procedure
2 parents 85def60 + a606ee3 commit 8ddbfd2

File tree

3 files changed

+35
-1
lines changed

3 files changed

+35
-1
lines changed

README.md

+1
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ Otherwise data loss of current chunks will be incurred. See [operations guide](
6565
* [Inputs](https://github.com/grafana/metrictank/blob/master/docs/inputs.md)
6666
* [Metrics](https://github.com/grafana/metrictank/blob/master/docs/metrics.md)
6767
* [Operations](https://github.com/grafana/metrictank/blob/master/docs/operations.md)
68+
* [Startup](https://github.com/grafana/metrictank/blob/master/docs/startup.md)
6869
* [Tools](https://github.com/grafana/metrictank/blob/master/docs/tools.md)
6970

7071
### features in-depth

cmd/metrictank/metrictank.go

+1-1
Original file line numberDiff line numberDiff line change
@@ -150,7 +150,7 @@ func main() {
150150
log.Infof("logging level set to '%s'", *logLevel)
151151

152152
/***********************************
153-
Validate settings needed for clustering
153+
Validate settings needed for clustering
154154
***********************************/
155155
if *instance == "" {
156156
log.Fatal("instance can't be empty")

docs/startup.md

+33
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Metrictank startup
2+
3+
The full startup procedure has many details, but here we cover the main steps if they affect:
4+
5+
* performance/resource usage characteristics
6+
* cluster status
7+
* API availability
8+
* diagnostics
9+
10+
11+
| Phase | Description | effect on CPU / RAM |
12+
| ----------------------- | -------------------------------------------------------------------------------------------------- | ----------------------------------- |
13+
| load config | load/validate config | no |
14+
| setup diagnostics | set up logging, profiling, proftrigger | no |
15+
| log startup | logs "Metrictank starting" message | no |
16+
| start sending stats | starts connecting and writing to graphite endpoint | no |
17+
| create Store | create keyspace, tables, write queues, etc | minor RAM increase ~ queue size |
18+
| create Input(s) | open connections (kafka) or listening sockets (carbon, prometheus) | no |
19+
| start cluster | starts gossip, joins cluster | no |
20+
| create Index | creates instance and starts write queues | minor RAM increase ~ queue size |
21+
| start API server | opens listening socket and starts handling requests in not-ready mode | no |
22+
| init Index | creates session, keyspace, tables, write queues, etc and loads in-memory index from persisted data | reasonable RAM and CPU increase |
23+
| create cluster notifier | optional: connects to Kafka, starts backfilling persistence message and waits until done or timeout| if backfilling: above-normal CPU, normal RAM usage |
24+
| start input plugin(s) | starts backfill (kafka) or listening (carbon, prometheus) and maintain priority based on input lag | if backfilling: above-normal CPU and RAM usage |
25+
| mark ready state | immediately (primary) or after warmup period (secondary) (combined with priority for clustering) | no |
26+
27+
We recommend provisioning a cluster such that it can backfill a 7 hour backlog in half on hour or less. This means:
28+
* The CPU increase during the kafka backfilling is very significant: typically a 14x cpu increase compared to normal usage.
29+
* The RAM usage during the input data backfilling is typically about 1.5x to 2x normal.
30+
31+
Backfilling will go as fast as it can until it reaches a bottleneck (kafka brokers, cpu constraints, etc), so your numbers may vary.
32+
33+
This is true for v0.11.0, but may need revising later.

0 commit comments

Comments
 (0)