Monitors kafka lag and publishes the metrics to different metrics backends
The supported metrics backends are Prometheus and InfluxDB
Prometheus:
The metrics in Prometheus format can be accessed at /prometheus
endpoint
# HELP kafka_consumer_lag_max
# TYPE kafka_consumer_lag_max gauge
kafka_consumer_lag_max{cluster_name="test-cluster",group="test-consumer",partition="1",topic="test-topic",} 2.0
kafka_consumer_lag_max{cluster_name="test-cluster",group="test-consumer",partition="0",topic="test-topic",} 2.0
# HELP kafka_consumer_lag
# TYPE kafka_consumer_lag summary
kafka_consumer_lag_count{cluster_name="test-cluster",group="test-consumer",partition="1",topic="test-topic",} 1.0
kafka_consumer_lag_sum{cluster_name="test-cluster",group="test-consumer",partition="1",topic="test-topic",} 2.0
kafka_consumer_lag_count{cluster_name="test-cluster",group="test-consumer",partition="0",topic="test-topic",} 1.0
kafka_consumer_lag_sum{cluster_name="test-cluster",group="test-consumer",partition="0",topic="test-topic",} 2.0
# HELP kafka_consumer_offset
# TYPE kafka_consumer_offset summary
kafka_consumer_offset_count{cluster_name="test-cluster",group="test-consumer",partition="1",topic="test-topic",} 1.0
kafka_consumer_offset_sum{cluster_name="test-cluster",group="test-consumer",partition="1",topic="test-topic",} 16.0
kafka_consumer_offset_count{cluster_name="test-cluster",group="test-consumer",partition="0",topic="test-topic",} 1.0
kafka_consumer_offset_sum{cluster_name="test-cluster",group="test-consumer",partition="0",topic="test-topic",} 13.0
# HELP kafka_consumer_offset_max
# TYPE kafka_consumer_offset_max gauge
kafka_consumer_offset_max{cluster_name="test-cluster",group="test-consumer",partition="1",topic="test-topic",} 16.0
kafka_consumer_offset_max{cluster_name="test-cluster",group="test-consumer",partition="0",topic="test-topic",} 13.0
# HELP kafka_partition_offset
# TYPE kafka_partition_offset summary
kafka_partition_offset_count{cluster_name="test-cluster",partition="1",topic="test-topic",} 1.0
kafka_partition_offset_sum{cluster_name="test-cluster",partition="1",topic="test-topic",} 18.0
kafka_partition_offset_count{cluster_name="test-cluster",partition="0",topic="test-topic",} 1.0
kafka_partition_offset_sum{cluster_name="test-cluster",partition="0",topic="test-topic",} 15.0
# HELP kafka_partition_offset_max
# TYPE kafka_partition_offset_max gauge
kafka_partition_offset_max{cluster_name="test-cluster",partition="1",topic="test-topic",} 18.0
kafka_partition_offset_max{cluster_name="test-cluster",partition="0",topic="test-topic",} 15.0
Influxdb:
Metrics in InfluxDB's line protocol format will be reported by default to http://localhost:8086/write
endpoint, every minute
kafka_consumer_lag,cluster_name=test-cluster,group=test-consumer,partition=0,topic=test-topic,metric_type=histogram sum=2,count=1,mean=2,upper=2 1612125711313
kafka_consumer_lag,cluster_name=test-cluster,group=test-consumer,partition=1,topic=test-topic,metric_type=histogram sum=2,count=1,mean=2,upper=2 1612125711311
kafka_consumer_offset,cluster_name=test-cluster,group=test-consumer,partition=0,topic=test-topic,metric_type=histogram sum=13,count=1,mean=13,upper=13 1612125711307
kafka_consumer_offset,cluster_name=test-cluster,group=test-consumer,partition=1,topic=test-topic,metric_type=histogram sum=16,count=1,mean=16,upper=16 1612125711308
kafka_partition_offset,cluster_name=test-cluster,partition=0,topic=test-topic,metric_type=histogram sum=15,count=1,mean=15,upper=15 1612125711311
kafka_partition_offset,cluster_name=test-cluster,partition=1,topic=test-topic,metric_type=histogram sum=18,count=1,mean=18,upper=18 1612125711313
docker run --rm \
-p 8080:8080 \
-v /path/to/config:/config \
-e MICRONAUT_CONFIG_FILES=/config/application.yml \
-e MICRONAUT_METRICS_EXPORT_INFLUX_ENABLED=false \
devatherock/kafka-lag-monitor:latest
kafka:
clusters: # Required. A list of kafka cluster definitions
- name: test-cluster # Required. Name of the cluster. The same name will be needed in `kafka.lag-monitor.clusters[*].name` config.
servers: test-cluster.test.com:9092 # Required. The server(s)/broker(s) that belong to this cluster
lag-monitor:
clusters:
- name: test-cluster # Required. Name of the cluster to monitor. Should be one of the defined `kafka.clusters[*].name`
consumer-groups: # Optional. List of consumer group names to monitor. Names will be matched exactly. Use `group-allowlist` for regex match
- test-consumer
group-allowlist: # Optional. List of regular expressions to match against consumer group names to monitor. Will be ignored if `consumer-groups` is specified
- deva.*
group-denylist: # Optional. List of regular expressions to match against consumer group names to exclude. Will be ignored if `consumer-groups` or `group-allowlist` is specified
- temp.*
threadpool-size: 5 # Optional. Size of the thread pool used by the lag monitor. Defaults to 5
timeout-seconds: 5 # Optional. Timeout for the requests to Kafka, in seconds. Defaults to 5
initial-delay-seconds: 60 # Optional. Initial delay before metric collection begins, in seconds. Defaults to 60
interval-seconds: 60 # Optional. Metric collection interval, in seconds. Defaults to 60
micronaut:
server:
port: 8080 # Optional. Port in which the app listens on
metrics:
export:
influx: # Config for publishing metrics to Influxdb
enabled: false # Optional. Indicates if metrics reporting to Influxdb is enabled. Defaults to true
uri: https://some.influx.host # Optional. The HTTP endpoint exposed by Influxdb, to which to report metrics. Defaults to http://localhost:8086
Environment Variable Name | Required | Default | Description |
---|---|---|---|
KAFKA_LAG_MONITOR_THREADPOOL_SIZE | false | 5 | Size of the thread pool used by the lag monitor |
KAFKA_LAG_MONITOR_TIMEOUT_SECONDS | false | 5 | Timeout for the requests to Kafka, in seconds |
LOGGER_LEVELS_ROOT | false | INFO | SLF4J log level, for all(framework and custom) code |
LOGGER_LEVELS_IO_GITHUB_DEVATHEROCK | false | INFO | SLF4J log level, for custom code |
MICRONAUT_SERVER_PORT | false | 8080 | Port in which the app listens on |
MICRONAUT_CONFIG_FILES | true | (None) | Path to YAML config files. The YAML files can be used to specify complex, object and array properties |
MICRONAUT_METRICS_EXPORT_INFLUX_ENABLED | false | true | Indicates if metrics reporting to Influxdb is enabled |
MICRONAUT_METRICS_EXPORT_INFLUX_URI | false | http://localhost:8086 | The HTTP endpoint exposed by Influxdb, to which to report metrics |
LOGBACK_CONFIGURATION_FILE | false | (None) | Path to logback configuration file |
- Set the environment variable
LOGGER_LEVELS_ROOT
toDEBUG
to enable all debug logs - custom and framework - Set the environment variable
LOGGER_LEVELS_IO_GITHUB_DEVATHEROCK
toDEBUG
to enable debug logs only in custom code - For fine-grained logging control, supply a custom logback.xml file
and set the environment variable
LOGBACK_CONFIGURATION_FILE
to/path/to/custom/logback.xml
To output logs as JSON, set the environment variable LOGBACK_CONFIGURATION_FILE
to logback-json.xml
. Refer
logstash-logback-encoder documentation to customize the field names and
formats in the log