Monitoring solution developed by the NVVS DevOps team (Network Voice Video Service DevOps team) to monitor the applications that this team currently manages.
- MoJO DNS Service
- MoJO DHCP Service
- DNS and DHCP Administration Portal
- SMTP Relay
- Network Access Control Service (NACS)
- Public Key Infrastructure (PKI)
- Monitoring Solution itself (EKS Cluster)
This is a high level list of metrics which are monitored, if a metric is not mentioned here this does not necessarily mean it is not monitored.
- MoJO DNS:
- Uptime
- Bandwidth
- MoJO DHCP:
- Uptime
- Subnet usage
- Bandwidth
- Runtime errors
- DNS / DHCP Admin Portal
- SMTP Relay:
- Message count
- Deferred messages count
- Network Access Control Service:
- Uptime
- Resource
- Errors
- Authentication success / failures
- Monitoring infrastructure (EKS Cluster):
- Uptime
- Resource
- Bandwidth / Network
Alerts are sent to various slack channels and pagerduty.
This solution consists of Prometheus, Thanos, Grafana and other exporters. Exporters enable Prometheus to scrape metrics from different sources and Grafana produces dashboards with those metrics. Thanos leverages the Prometheus storage format to cost-efficiently store historical metric data in a S3 bucket while retaining fast query latencies. Additionally, it provides a global query view across all Prometheus installations. This means Prometheus instances running elsewhere can remotely write metrics to this system, Grafana can then visualise them and metrics are stored in the central storage.
Helm charts used in this solution:
To access the dashboards and query metrics use grafana at the below address.
📊 Grafana |
---|
https://monitoring-alerting.staff.service.justice.gov.uk |
Logon access to grafana is managed on Production Azure AD. Please contact azure team to gain access.
To consume metrics from other Prometheus instances using the remote write functionality, configure your prometheus to remote write to the below url:
✍️ Prometheus Remote Write |
---|
https://thanos-receive.monitoring-alerting.staff.service.justice.gov.uk/api/v1/receive |
For technical details, HLDs, LLDs and developer instructions, please visit the technical documentation page.
In order to test changes to our monitoring solution, we have a Development environment setup. To get that environment up and running locally, you will have to work through the following steps:
- Check which environment the the Kube context is pointing to, this is likely to be Production.
kubectl config get-contexts
- To add the Development context you will have to generate the kube config. To do this, run the following commands:
make clean
make gen-env
make get-kubeconfig
- Re-check the Kube context for the Development environment
kubectl config get-contexts
- Check the connection to the cluster by running
kubectl get pods -A
- The Grafana dashboard is not available over the internet. To access the dashboard it will need to be done locally, by using port forwarding.
kubectl port-forward svc/grafana 3000:80 -n grafana
-
Access using localhost:3000
-
The Grafana dashboard requires a username and password. Username is 'admin'. To get the password you will need to run:
make grafana-pwd
- To view grafana versions
Helm list -n grafana
- To view status of deployment
kubectl get pods -n grafana