A simple monitoring example solution that integrates well with simpleserver.
Install it with: make install SLACK_URL=$SLACK_URL
(which will call into the sub directories for you)
Set or replace $SLACK_URL
to a
Webhook URL as described in this blog post.
It may be possible that your team already has a monitoring channel somewhere,
if so you don't need to create a new webhook, just ask around. That makes the
setup also quicker.
Run it with: make run
The architecture of the monitoring stack is like a house. At the foundation there is Prometheus which will collect the data, store it locally and offer a query API. On top are different components that provide specific features. While Prometheus will work fine without the other components many of the other components will not work without Prometheus or another, similar data storage system.
The components currently chosen are:
- Prometheus (in /prom): Data collector, timeseries db, promQL querier
- Grafana (in /graf): Data filter and and visualizer
- Node Exporter (in /node): collect data from the host the Exporter is running on and present it in collectable form to Prometheus
- Alert Manager (in /alman): filter Prometheus data repeatedly and if certain limits are reached send alerts to different sources like Slack
- Thanos (in /thanos): HA layer and multi-prom-querier for Prometheus
At the time of this writing the connection to the simpleserver project, and the Thanos integration is WIP.
The following articles have been used in the creation of this repo. Better documentation is TODO, but in the mean time hopefully these can help:
- A blog series that convinced me that podman might be more interesting than docker
- Where metrics fit into Cluster Observability
- Timeseries and PromQL basics
- Explanation for the components of a PromQL Query
- Node Exporter + Prom install
- Apache Exporter for Simpleserver
- Another Apache stats related article
- CPU Usage Metrics
- Disk Usage Metrics
- Better Living Through Stats <- from an ex-googler, he knows this stuff in and out it seems
May come in handy later:
- it might be useful to collect data from Openshift instances as well, so I collected some input on this front:
- Controlling multi service Systemd environment
- the idea is great, have a meta service which will manage the other services for you that belong to a shared application
- starting worked fine, but stopping not so much
- dropped for now, will have a look again on a reworking session
- Grafana alternative Consoles
- Grafana is nice for experiments and very beautiful
- capabilities are limited though, for instance when trying to automate
- consoles are a more flexible way and already in their documentation advertise that they can be hosted in SCM systems