Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add jmx monitoring using grafana prometheus step by step #225

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
145 changes: 145 additions & 0 deletions monitoring/jmx-grafana-prometheus-step-by-step/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# JMX Monitoring using Prometheus and Grafana


![monitoring1](img/monitoring-1.png)


![monitoring2](img/monitoring-2.png)


![monitoring3](img/monitoring-3.png)


# Steps to create

## 1. Run Confluent Platform

```sh
helm repo add confluentinc https://packages.confluent.io/helm
kubectl create namespace confluent
kubectl apply -f confluent-platform.yml -n confluent
```

Verify confluent components are created as pods
```sh
kubectl get pods -n confluent
```

Once a Kafka broker and a Connect worker is ready, run the below command to add topics and connectors.

```sh
kubectl apply -f connect-task-datagen.yml -n confluent
```


## 2. Deploy Prometheus

Create monitoring namespace

```sh
kubectl create namespace monitoring
```

Create a cluster role so that components in `monitoring` can access `confluent` namespace. This is required to access JMX metrics endpoint.

```sh
kubectl apply -f prometheus-rbac.yml
```

(Optional) to persist data, change the reference from `prometheus-tmp-storage-volume` to `prometheus-persistent-storage` in `prometheus-deployment.yml` and update the PVC claim name reference within `prometheus-persistent-storage`. Or run the command below to persist locally.
```sh
kubectl apply -f prometheus-volume.yml -n monitoring
```

Load Promtheus rules which tells how to get metrics. You can also add alerts in this file.
```sh
kubectl apply -f prometheus-config-map.yml -n monitoring
```

Deploy prometheus
```sh
kubectl apply -f prometheus-deployment.yml -n monitoring
```

Deploy prometheus service to expose its port

```sh
kubectl apply -f prometheus-service.yml -n monitoring
```

Make sure prometheus is working by running
```sh
kubectl get pod -n monitoring
kubectl port-forward pod/prometheus-deployment-5cdf44b7d4-fskzw -n monitoring 9090:9090
```
Then open localhost:9090 on the browser and access `Status` -> `Targets`. Make sure all components are `UP`.

![grafana targets page](img/grafana-targets.png)

## 3. Deploy Grafana

Now that Prometheus is pulling metrics from Kafka components, we setup Grafana to visualize this.



(Optional) to persist data, change the reference from `grafana-tmp-storage-volume` to `grafana-persistent-storage` in `grafana-deployment.yml` and update the PVC claim name reference within `grafana-persistent-storage`. Or run the command below to persist locally.

```sh
kubectl apply -f grafana-volume.yml -n monitoring
```


Load Grafana datasource which tells where to get metrics. You can also configure this manually afterwards.
```sh
kubectl apply -f grafana-datasource-config.yml -n monitoring
```

Deploy Grafana
```sh
kubectl apply -f grafana-deployment.yml -n monitoring
```

Deploy Grafana service to expose its port

```sh
kubectl apply -f grafana-service.yml -n monitoring
```

Make sure Grafana is working by running
```sh
kubectl get pod -n monitoring
kubectl port-forward pod/grafana-68dd7f6bb5-89jfq -n monitoring 3000:3000
```
Then open localhost:3000 on the browser and login with `user=admin`, `password=admin`. Then click "Skip" to get to the dashboard.

Click `+ Import`

![grafana dashboard page](img/grafana-dashboard.png)

and paste the contents of `grafana-dashboard.json` into the textarea below and click `Load`.

![grafana dashboard import page](img/grafana-dashboard-import.png)

```
Note: Since volume is not persistent, the dashboard will be removed when the pod restarts.
```

# Steps to destroy

```sh
k delete -f grafana-service.yml
k delete -f grafana-deployment.yml -n monitoring
k delete -f grafana-datasource-config.yml -n monitoring

k delete -f prometheus-service.yml -n monitoring
k delete -f prometheus-deployment.yml -n monitoring
k delete -f prometheus-config-map.yml -n monitoring
k delete -f prometheus-rbac.yml

# if optionally created
k delete -f grafana-volume.yml -n monitoring
k delete -f prometheus-volume.yml -n monitoring


k delete namespace monitoring
```
139 changes: 139 additions & 0 deletions monitoring/jmx-grafana-prometheus-step-by-step/confluent-platform.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
apiVersion: platform.confluent.io/v1beta1
kind: Zookeeper
metadata:
name: zookeeper
namespace: confluent
spec:
replicas: 3
image:
application: confluentinc/cp-zookeeper:7.3.0
init: confluentinc/confluent-init-container:2.5.0
dataVolumeCapacity: 10Gi
logVolumeCapacity: 10Gi
configOverrides:
log4j:
- log4j.appender.stdout=org.apache.log4j.ConsoleAppender
- log4j.appender.stdout.layout=org.apache.log4j.EnhancedPatternLayout
- log4j.appender.stdout.layout.ConversionPattern={"debug_level":"%p","debug_timestamp":"%d{ISO8601}","debug_thread":"%t","debug_file":"%F", "debug_line":"%L","debug_message":"%m"}%n
- log4j.rootLogger=INFO, stdout
- log4j.logger.org.apache.zookeeper=ERROR, stdout
- log4j.logger.org.I0Itec.zkclient=ERROR, stdout
- log4j.logger.org.reflections=ERROR, stdout
---
apiVersion: platform.confluent.io/v1beta1
kind: Kafka
metadata:
name: kafka
namespace: confluent
spec:
replicas: 3
image:
application: confluentinc/cp-server:7.3.0
init: confluentinc/confluent-init-container:2.5.0
dataVolumeCapacity: 100Gi
metricReporter:
enabled: true
configOverrides:
log4j:
- log4j.appender.stdout=org.apache.log4j.ConsoleAppender
- log4j.appender.stdout.layout=org.apache.log4j.EnhancedPatternLayout
- log4j.appender.stdout.layout.ConversionPattern={"debug_level":"%p","debug_timestamp":"%d{ISO8601}","debug_thread":"%t","debug_file":"%F", "debug_line":"%L","debug_message":"%m"}%n
- log4j.rootLogger=INFO, stdout
- log4j.logger.org.apache.zookeeper=ERROR, stdout
- log4j.logger.org.I0Itec.zkclient=ERROR, stdout
- log4j.logger.org.reflections=ERROR, stdout
metricReporter:
enabled: true
---
apiVersion: platform.confluent.io/v1beta1
kind: Connect
metadata:
name: connect
namespace: confluent
spec:
replicas: 2
image:
application: confluentinc/cp-server-connect:7.3.2
init: confluentinc/confluent-init-container:2.5.0
dependencies:
kafka:
bootstrapEndpoint: kafka:9071
build:
type: onDemand
onDemand:
plugins:
locationType: confluentHub
confluentHub:
- name: kafka-connect-datagen
owner: confluentinc
version: 0.5.2
configOverrides:
server:
- schema.registry.url=http://schemaregistry.confluent.svc.cluster.local:8081
---
apiVersion: platform.confluent.io/v1beta1
kind: KsqlDB
metadata:
name: ksqldb
namespace: confluent
spec:
replicas: 1
image:
application: confluentinc/cp-ksqldb-server:7.3.2
init: confluentinc/confluent-init-container:2.5.0
dataVolumeCapacity: 10Gi
---
apiVersion: platform.confluent.io/v1beta1
kind: ControlCenter
metadata:
name: controlcenter
namespace: confluent
spec:
replicas: 1
image:
application: confluentinc/cp-enterprise-control-center:7.3.2
init: confluentinc/confluent-init-container:2.5.0
dataVolumeCapacity: 10Gi
configOverrides:
log4j:
- log4j.appender.stdout=org.apache.log4j.ConsoleAppender
- log4j.appender.stdout.layout=org.apache.log4j.EnhancedPatternLayout
- log4j.appender.stdout.layout.ConversionPattern={"debug_level":"%p","debug_timestamp":"%d{ISO8601}","debug_thread":"%t","debug_file":"%F", "debug_line":"%L","debug_message":"%m"}%n
- log4j.rootLogger=INFO, stdout
- log4j.logger.org.apache.zookeeper=ERROR, stdout
- log4j.logger.org.I0Itec.zkclient=ERROR, stdout
- log4j.logger.org.reflections=ERROR, stdout
dependencies:
schemaRegistry:
url: http://schemaregistry.confluent.svc.cluster.local:8081
ksqldb:
- name: ksqldb
url: http://ksqldb.confluent.svc.cluster.local:8088
connect:
- name: connect
url: http://connect.confluent.svc.cluster.local:8083
---
apiVersion: platform.confluent.io/v1beta1
kind: SchemaRegistry
metadata:
name: schemaregistry
namespace: confluent
spec:
replicas: 2
image:
application: confluentinc/cp-schema-registry:7.3.2
init: confluentinc/confluent-init-container:2.5.0
---
apiVersion: platform.confluent.io/v1beta1
kind: KafkaRestProxy
metadata:
name: kafkarestproxy
namespace: confluent
spec:
replicas: 1
image:
application: confluentinc/cp-kafka-rest:7.3.2
init: confluentinc/confluent-init-container:2.5.0
dependencies:
schemaRegistry:
url: http://schemaregistry.confluent.svc.cluster.local:8081
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
---
apiVersion: platform.confluent.io/v1beta1
kind: Connector
metadata:
name: pageviews
namespace: confluent
spec:
class: "io.confluent.kafka.connect.datagen.DatagenConnector"
taskMax: 2
connectClusterRef:
name: connect
configs:
kafka.topic: "pageviews"
quickstart: "pageviews"
key.converter: "org.apache.kafka.connect.storage.StringConverter"
key.converter.schemas.enable: "false"
value.converter: "io.confluent.connect.avro.AvroConverter"
value.converter.schemas.enable: "true"
value.converter.schema.registry.url: "http://schemaregistry.confluent.svc.cluster.local:8081"
schema.registry.url: "http://schemaregistry.confluent.svc.cluster.local:8081"
max.interval: "1000"
iterations: "10000000000"

---
apiVersion: platform.confluent.io/v1beta1
kind: KafkaTopic
metadata:
name: pagevie
namespace: confluent
spec:
replicas: 3
partitionCount: 3
configs:
cleanup.policy: "delete"
---
apiVersion: platform.confluent.io/v1beta1
kind: Connector
metadata:
name: users
namespace: confluent
spec:
class: "io.confluent.kafka.connect.datagen.DatagenConnector"
taskMax: 2
connectClusterRef:
name: connect
configs:
kafka.topic: "users"
quickstart: "users"
key.converter: "org.apache.kafka.connect.storage.StringConverter"
key.converter.schemas.enable: "false"
value.converter: "io.confluent.connect.avro.AvroConverter"
value.converter.schemas.enable: "true"
value.converter.schema.registry.url: "http://schemaregistry.confluent.svc.cluster.local:8081"
schema.registry.url: "http://schemaregistry.confluent.svc.cluster.local:8081"
max.interval: "3000"
iterations: "10000000"
---
apiVersion: platform.confluent.io/v1beta1
kind: KafkaTopic
metadata:
name: users
namespace: confluent
spec:
replicas: 3
partitionCount: 3
configs:
cleanup.policy: "delete"
Loading