Skip to content
This repository was archived by the owner on Aug 23, 2023. It is now read-only.

better documentation for our input formats #1071

Merged
merged 4 commits into from
Oct 3, 2018
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 29 additions & 6 deletions docs/inputs.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,16 +23,39 @@ note: it does not implement [carbon2.0](http://metrics20.org/implementations/)

## Kafka-mdm (recommended)

The Kafka input supports 2 formats:
* MetricData Messagepack-encoded (legacy: slow and verbose. they contain the data points as well as all metric data. see #876)
* MetricPoint messages. (more optimized: contains only id, value and timestamp. see #876)

See the [schema repository](https://github.com/raintank/schema) for more details.

This is the recommended input option if you want a queue. It also simplifies the operational model: since you can make nodes replay data
you don't have to reassign primary/secondary roles at runtime, you can just restart write nodes and have them replay data, for example.
Note that [carbon-relay-ng](https://github.com/graphite-ng/carbon-relay-ng) can be used to pipe a carbon stream into Kafka.

The Kafka input supports 2 formats:

* MetricData
* MetricPoint

Both formats have a corresponding implementation in the [schema repository](https://github.com/raintank/schema), making it trivial
to implement your own producers (and consumers) if you use Golang.

### MetricData

This format contains the data points as well as all metric identity data and metadata, in messagepack encoded messages (not JSON).
This format is rather verbose and inefficient to encode/decode.
See the [MetricData](https://godoc.org/github.com/raintank/schema#MetricData) documentation for format and implementation details.

### MetricPoint

This format is a hand-written binary format that is much more compact and fast to encode/decode compared to MetricData.
See the [MetricPoint](https://godoc.org/github.com/raintank/schema#MetricPoint) documentation for format and implementation details.
It is a minimal format that only contains the series identifier, value and timestamp.
As such, it is paramount that MetricPoint messages for each series have been preceded by a MetricData message for that series, so
that metrictank has had the chance to add the metric information into its index.
Otherwise metrictank will not recognize the ID and discard the point.

Note that the implementation has encode/decode function for the standard MetricPoint format, as well as a variant that does not encode the org-id
part of the series id. For single-tenant environments, you can configure your producers and metrictank to not encode an org-id in all messages
and rather just set it in configuration, this makes the message more compact, but won't work in multi-tenant environments.

### Future formats

In the future we plan to do more optimisations such as:
* batch encoding instead of a kafka message per point.
* further compression (e.g. multiple points with shared timestamp).