Skip to content
This repository was archived by the owner on Aug 23, 2023. It is now read-only.

Commit d29ac77

Browse files
committed
update docs for bigtable, storage-schemas.conf and tank GC
1 parent 5e06998 commit d29ac77

File tree

2 files changed

+33
-18
lines changed

2 files changed

+33
-18
lines changed

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Metrictank is a multi-tenant timeseries engine for Graphite and friends.
1010
It provides long term storage, high availability, efficient storage, retrieval and processing for large scale environments.
1111

1212
[GrafanaLabs](http://grafana.com) has been running metrictank in production since December 2015.
13-
It currently requires an external datastore like Cassandra, and we highly recommend using Kafka to support clustering, as well
13+
It currently requires an external datastore like Cassandra or Bigtable, and we highly recommend using Kafka to support clustering, as well
1414
as a clustering manager like Kubernetes. This makes it non-trivial to operate, though GrafanaLabs has an on-premise product
1515
that makes this process much easier.
1616

docs/memory-server.md

+32-17
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Memory server
22

3-
While metrictank persists timeseries data to cassandra, it can - and should - serve most data out of memory.
3+
While metrictank persists timeseries data to a store such as cassandra or bigtable, it can - and should - serve most data out of memory.
44
It has two mechanisms to support this: the ring buffers, and the chunk-cache. These features are complementary to each other, and are described in more detail below.
55

66
## In-memory components
@@ -11,11 +11,11 @@ The ring buffer is simply a list of chunks - one for each series - that holds th
1111
You can configure how many chunks to retain (`numchunks`).
1212
The ring buffer can be useful to assure data that may be needed is in memory, in these cases:
1313
* you know a majority of your queries hits the most recent data of a given time window (e.g. last 2 hours, last day), you know this is unlikely to change and true for the vast majority of your metrics.
14-
* keep secondaries able to satisfy queries from RAM for the most recent data of cold (infrequently queried) series, even if the primary is not able to save its chunks instantly, if it crashed and needs to be restarted or if you're having a cassandra outage so that chunks can't be loaded or saved. Note that this does not apply for hot data: data queried frequently enough (at least as frequent as their chunkspan) will be added to the chunk cache automatically (see below) and not require cassandra lookups.
14+
* keep secondaries able to satisfy queries from RAM for the most recent data of cold (infrequently queried) series, even if the primary is not able to save its chunks instantly, if it crashed and needs to be restarted or if you're having a store outage so that chunks can't be loaded or saved. Note that this does not apply for hot data: data queried frequently enough (at least as frequent as their chunkspan) will be added to the chunk cache automatically (see below) and not require storage lookups.
1515

1616
Note:
1717
* the last (current) chunk is always a "work in progress", so depending on what time it is, it may be anywhere between empty and full.
18-
* when metrictank starts up, it will not refill the ring buffer with data from Cassandra. They only fill based on data that comes in. But once data has been seen, the buffer
18+
* when metrictank starts up, it will not refill the ring buffer with data from the store. They only fill based on data that comes in. But once data has been seen, the buffer
1919
will keep the most chunks it can, until data is expired when series haven't been seen in a while.
2020

2121
Both of these make it tricky to articulate how much data is in the ringbuffer for a given series. But `(numchunks-1) * chunkspan` is the conservative approximation which is valid in the typical case (a warmed up metrictank that's ingesting fresh data).
@@ -26,21 +26,21 @@ So unless you're confident your metrics are all subject to queries of the same t
2626

2727
### Chunk Cache
2828

29-
The goal of the chunk cache is to offload as much read workload from cassandra as possible.
30-
Any data chunks fetched from Cassandra are added to the chunk cache.
29+
The goal of the chunk cache is to offload read workload from the store.
30+
Any data chunks fetched from the store are added to the chunk cache.
3131
But also, more interestingly, chunks expired out of the ring buffers will automatically be added to the chunk cache if the chunk before it is also in the cache.
32-
In other words, for series we know to be "hot" (queried frequently enough so that their data is kept in the chunk cache) we will try to avoid a roundtrip to Cassandra before adding the chunks to the cache. This can be especially useful when it takes long for the primary to save data to cassandra, or when there is a cassandra outage.
32+
In other words, for series we know to be "hot" (queried frequently enough so that their data is kept in the chunk cache) we will try to avoid a roundtrip to the store before adding the chunks to the cache. This can be especially useful when it takes long for the primary to persist chunks, or when there is a storage outage.
3333
The chunk cache has a configurable [maximum size](https://github.com/grafana/metrictank/blob/master/docs/config.md#chunk-cache),
3434
within that size it tries to always keep the most often queried data by using an LRU mechanism that evicts the Least Recently Used chunks.
3535

3636
The effectiveness of the chunk cache largely depends on the common query patterns and the configured `max-size` value:
3737
If a small number of metrics gets queried often, the chunk cache will be effective because it can serve most requests out of its memory.
3838
On the other hand, if most queries involve metrics that have not been queried for a long time and if they are only queried a small number of times,
39-
then Metrictank will need to fallback to Cassandra more often.
39+
then Metrictank will need to fallback to the store more often.
4040

4141
## Configuration guidelines
4242

43-
See [the example config](https://github.com/grafana/metrictank/blob/master/metrictank-sample.ini) for an overview and basic explanation of what the config values are.
43+
See [config documentation](./config.md) for an overview and basic explanation of what the config values are. Most of them are in storage-schemas.conf
4444
Some of the values related to chunking and compression are a bit harder to tune, so this section will explain in more detail.
4545

4646

@@ -49,9 +49,9 @@ Some of the values related to chunking and compression are a bit harder to tune,
4949
### Basic guideline
5050

5151
`chunkspan` is how long of a timeframe should be covered by your chunks. E.g. you could store anywhere between 1 second to 24 hours worth of data in a chunk.
52-
`numchunks` is simply up to how many chunks should be retained in the ring buffers per metric. Queries for data not in the ringbuffer will hit the chunk-cache-fronted cassandra store.
52+
`numchunks` is simply up to how many chunks should be retained in the ring buffers per metric. Queries for data not in the ringbuffer will hit the chunk-cache-fronted store.
5353

54-
These two values are defined globally for raw metrics, and also within the `agg-settings` configuration, as each rollup definition needs appropriate chunk span and numchunks based on the interval of the rollup. The settings currently apply to all metrics, we don't support finetuning on a per-metric or per-category level yet.
54+
Note that these are defined for each archive individually (raw and rollups)
5555

5656
#### Valid chunk spans
5757

@@ -64,15 +64,15 @@ The standard recommendation is at least 120 points per chunk.
6464

6565
#### Compression efficiency
6666

67-
The more points are contained within a chunk, the more efficiently the compression can work. This is very noticeable
68-
until about 120 points per chunk, at which point the improvement becomes less relevant.
67+
The more points are contained within a chunk, the more efficiently the compression can work.
68+
After about 120 points per chunk, the returns diminish.
6969
For more details, see the [go-tsz eval program](https://github.com/dgryski/go-tsz/tree/master/eval) or the
7070
[results table](https://raw.githubusercontent.com/dgryski/go-tsz/master/eval/eval-results.png)
7171

72-
#### Cassandra load
72+
#### Store load
7373

74-
Longer chunks with more points mean a lower rate of chunksaves, meaning fewer write commands and iops for cassandra.
75-
There is probably an upper limit where cassandra becomes unhappy with large chunksizes, but we haven't seen that yet.
74+
Longer chunks with more points mean a lower rate of chunk saves in the store.
75+
There is probably an upper limit where the store becomes unhappy with large chunksizes, but we haven't seen that yet.
7676

7777
#### Memory overhead
7878

@@ -96,8 +96,9 @@ We plan to keep working on performance and memory management and hope to make th
9696

9797
In principle, you need just 1 chunk for each series.
9898
However:
99-
* when the data stream moves into a new chunk, secondary nodes would drop the previous chunk and query Cassandra. But the primary needs some time to save the chunk to Cassandra. Based on your deployment this could take anywhere between milliseconds or many minutes. Possibly even an hour or more. As you don't want to slam Cassandra with requests at each chunk clear, you should probably use a numchunks of 2, or a numchunks that lets you retain data in memory for however long it takes to flush data to cassandra. (though the chunk cache alleviates this concern for hot data, see above).
100-
* The ringbuffers can be useful to let you deal with crashes or outages of your primary node. If your primary went down, or for whatever reason cannot save data to Cassandra, then you won't even feel it if the ringbuffers can "clear the gap" between in memory data and older data in cassandra. So we advise to think about how fast your organisation could resolve a potential primary outage, and then set your parameters such that `(numchunks-1) * chunkspan` is more than that. (again, with a sufficiently large cache, this is only a concern for cold data)
99+
* when the data stream moves into a new chunk, secondary nodes would drop the previous chunk and query the store But the primary needs some time to save the chunk to the store.
100+
Based on your deployment this could take anywhere between milliseconds or many minutes. Possibly even an hour or more. As you don't want to slam the store with requests at each chunk clear, you should probably use a numchunks of 2, or a numchunks that lets you retain data in memory for however long it takes to persist data. (though the chunk cache alleviates this concern for hot data, see above).
101+
* The ringbuffers can be useful to let you deal with crashes or outages of your primary node. If your primary went down, or for whatever reason cannot save data, then you won't even feel it if the ringbuffers can "clear the gap" between in memory data and older data in the store. So we advise to think about how fast your organisation could resolve a potential primary outage, and then set your parameters such that `(numchunks-1) * chunkspan` is more than that. (again, with a sufficiently large cache, this is only a concern for cold data)
101102

102103
#### Rollups remove the need to keep large number of higher resolution chunks
103104

@@ -119,3 +120,17 @@ numchunks = 7
119120
```
120121

121122
Any older data that is often queried will be within the [chunk-cache](#chunk-cache).
123+
124+
125+
### Garbage collection
126+
127+
Normally, in the tank, we close and persist chunks when data comes in for a newer chunk. But this may not always happen (e.g. you stop sending data for a given series), as such we have a GC mechanism that is configured at the top of the metrictank config file.
128+
129+
chunks (raw and rollups) are closed and persisted when at a GC run:
130+
- they haven't been written to for `chunk-max-stale` (default: 1h) or more.
131+
- and we should have started a new chunk 15 min ago or more (if the stream was realtime)
132+
133+
For this mechanism to work, kafka retention should be:
134+
largest raw chunk span + gc interval + chunk-max-stale + safety window for manual interventions upon a crash, and time needed to drain write queues
135+
Why? consider what happens in a worst case scenario: we might do a GC check right before chunk-max-stale is hit, so we must wait until next GC run. at which point GC kicks in and starts filling up the write queue.
136+
but just before the chunk is moved from write queue into persistent store, and the instance crashes. and we need manual intervention to get a new writer up and running

0 commit comments

Comments
 (0)