Skip to content

Monitors kafka lag and publishes the metrics to different metrics backends

License

Notifications You must be signed in to change notification settings

devatherock/kafka-lag-monitor

Repository files navigation

CircleCI Coverage Status Quality Gate Docker Pulls Lines of Code Docker Image Size

kafka-lag-monitor

Monitors kafka lag and publishes the metrics to different metrics backends

Metrics

The supported metrics backends are Prometheus and InfluxDB

Sample metrics

Prometheus:

The metrics in Prometheus format can be accessed at /prometheus endpoint

# HELP kafka_consumer_lag_max  
# TYPE kafka_consumer_lag_max gauge
kafka_consumer_lag_max{cluster_name="test-cluster",group="test-consumer",partition="1",topic="test-topic",} 2.0
kafka_consumer_lag_max{cluster_name="test-cluster",group="test-consumer",partition="0",topic="test-topic",} 2.0
# HELP kafka_consumer_lag  
# TYPE kafka_consumer_lag summary
kafka_consumer_lag_count{cluster_name="test-cluster",group="test-consumer",partition="1",topic="test-topic",} 1.0
kafka_consumer_lag_sum{cluster_name="test-cluster",group="test-consumer",partition="1",topic="test-topic",} 2.0
kafka_consumer_lag_count{cluster_name="test-cluster",group="test-consumer",partition="0",topic="test-topic",} 1.0
kafka_consumer_lag_sum{cluster_name="test-cluster",group="test-consumer",partition="0",topic="test-topic",} 2.0
# HELP kafka_consumer_offset  
# TYPE kafka_consumer_offset summary
kafka_consumer_offset_count{cluster_name="test-cluster",group="test-consumer",partition="1",topic="test-topic",} 1.0
kafka_consumer_offset_sum{cluster_name="test-cluster",group="test-consumer",partition="1",topic="test-topic",} 16.0
kafka_consumer_offset_count{cluster_name="test-cluster",group="test-consumer",partition="0",topic="test-topic",} 1.0
kafka_consumer_offset_sum{cluster_name="test-cluster",group="test-consumer",partition="0",topic="test-topic",} 13.0
# HELP kafka_consumer_offset_max  
# TYPE kafka_consumer_offset_max gauge
kafka_consumer_offset_max{cluster_name="test-cluster",group="test-consumer",partition="1",topic="test-topic",} 16.0
kafka_consumer_offset_max{cluster_name="test-cluster",group="test-consumer",partition="0",topic="test-topic",} 13.0
# HELP kafka_partition_offset  
# TYPE kafka_partition_offset summary
kafka_partition_offset_count{cluster_name="test-cluster",partition="1",topic="test-topic",} 1.0
kafka_partition_offset_sum{cluster_name="test-cluster",partition="1",topic="test-topic",} 18.0
kafka_partition_offset_count{cluster_name="test-cluster",partition="0",topic="test-topic",} 1.0
kafka_partition_offset_sum{cluster_name="test-cluster",partition="0",topic="test-topic",} 15.0
# HELP kafka_partition_offset_max  
# TYPE kafka_partition_offset_max gauge
kafka_partition_offset_max{cluster_name="test-cluster",partition="1",topic="test-topic",} 18.0
kafka_partition_offset_max{cluster_name="test-cluster",partition="0",topic="test-topic",} 15.0

Influxdb:

Metrics in InfluxDB's line protocol format will be reported by default to http://localhost:8086/write endpoint, every minute

kafka_consumer_lag,cluster_name=test-cluster,group=test-consumer,partition=0,topic=test-topic,metric_type=histogram sum=2,count=1,mean=2,upper=2 1612125711313
kafka_consumer_lag,cluster_name=test-cluster,group=test-consumer,partition=1,topic=test-topic,metric_type=histogram sum=2,count=1,mean=2,upper=2 1612125711311
kafka_consumer_offset,cluster_name=test-cluster,group=test-consumer,partition=0,topic=test-topic,metric_type=histogram sum=13,count=1,mean=13,upper=13 1612125711307
kafka_consumer_offset,cluster_name=test-cluster,group=test-consumer,partition=1,topic=test-topic,metric_type=histogram sum=16,count=1,mean=16,upper=16 1612125711308
kafka_partition_offset,cluster_name=test-cluster,partition=0,topic=test-topic,metric_type=histogram sum=15,count=1,mean=15,upper=15 1612125711311
kafka_partition_offset,cluster_name=test-cluster,partition=1,topic=test-topic,metric_type=histogram sum=18,count=1,mean=18,upper=18 1612125711313

Usage

docker run --rm \
        -p 8080:8080  \
        -v /path/to/config:/config \
        -e MICRONAUT_CONFIG_FILES=/config/application.yml \
        -e MICRONAUT_METRICS_EXPORT_INFLUX_ENABLED=false \
        devatherock/kafka-lag-monitor:latest

Configurable properties

application.yml variables

kafka:
  clusters: # Required. A list of kafka cluster definitions
    - name: test-cluster # Required. Name of the cluster. The same name will be needed in `kafka.lag-monitor.clusters[*].name` config. 
      servers: test-cluster.test.com:9092 # Required. The server(s)/broker(s) that belong to this cluster
  lag-monitor:
    clusters:
      - name: test-cluster # Required. Name of the cluster to monitor. Should be one of the defined `kafka.clusters[*].name`
        consumer-groups: # Optional. List of consumer group names to monitor. Names will be matched exactly. Use `group-allowlist` for regex match
          - test-consumer
        group-allowlist: # Optional. List of regular expressions to match against consumer group names to monitor. Will be ignored if `consumer-groups` is specified
          - deva.*
        group-denylist: # Optional. List of regular expressions to match against consumer group names to exclude. Will be ignored if `consumer-groups` or `group-allowlist` is specified
          - temp.*
    threadpool-size: 5 # Optional. Size of the thread pool used by the lag monitor. Defaults to 5
    timeout-seconds: 5 # Optional. Timeout for the requests to Kafka, in seconds. Defaults to 5
    initial-delay-seconds: 60 # Optional. Initial delay before metric collection begins, in seconds. Defaults to 60
    interval-seconds: 60 # Optional. Metric collection interval, in seconds. Defaults to 60
micronaut:
  server:
    port: 8080 # Optional. Port in which the app listens on
  metrics:
    export:
      influx: # Config for publishing metrics to Influxdb
        enabled: false # Optional. Indicates if metrics reporting to Influxdb is enabled. Defaults to true
        uri: https://some.influx.host # Optional. The HTTP endpoint exposed by Influxdb, to which to report metrics. Defaults to http://localhost:8086

Environment variables

Environment Variable Name Required Default Description
KAFKA_LAG_MONITOR_THREADPOOL_SIZE false 5 Size of the thread pool used by the lag monitor
KAFKA_LAG_MONITOR_TIMEOUT_SECONDS false 5 Timeout for the requests to Kafka, in seconds
LOGGER_LEVELS_ROOT false INFO SLF4J log level, for all(framework and custom) code
LOGGER_LEVELS_IO_GITHUB_DEVATHEROCK false INFO SLF4J log level, for custom code
MICRONAUT_SERVER_PORT false 8080 Port in which the app listens on
MICRONAUT_CONFIG_FILES true (None) Path to YAML config files. The YAML files can be used to specify complex, object and array properties
MICRONAUT_METRICS_EXPORT_INFLUX_ENABLED false true Indicates if metrics reporting to Influxdb is enabled
MICRONAUT_METRICS_EXPORT_INFLUX_URI false http://localhost:8086 The HTTP endpoint exposed by Influxdb, to which to report metrics
LOGBACK_CONFIGURATION_FILE false (None) Path to logback configuration file

Troubleshooting

Enabling debug logs

  • Set the environment variable LOGGER_LEVELS_ROOT to DEBUG to enable all debug logs - custom and framework
  • Set the environment variable LOGGER_LEVELS_IO_GITHUB_DEVATHEROCK to DEBUG to enable debug logs only in custom code
  • For fine-grained logging control, supply a custom logback.xml file and set the environment variable LOGBACK_CONFIGURATION_FILE to /path/to/custom/logback.xml

JSON logs

To output logs as JSON, set the environment variable LOGBACK_CONFIGURATION_FILE to logback-json.xml. Refer logstash-logback-encoder documentation to customize the field names and formats in the log