This repo has an automated release process where each feature/bug fix will be released immediately after it is merged to main
:
- Update QA with the work to be done to ensure they're informed and can guide development.
- Create a ticket for the feature/bug fix.
- Submit a PR, and make sure that the PR title is clear, readable, and follows the strict commit message format described in the commit message format section below. If the PR title does not comply, automatic release will fail.
- Have the PR reviewed.
- Squash and merge the PR to
main
. The commit message should be the already-formatted PR title but double check it's clear, readable, and follows the strict commit message format to make sure the automatic release works as expected. - Close the ticket.
The commit format should follow the convention outlined in the CHT docs. Examples are provided below.
Type | Example commit message | Release type |
---|---|---|
Bug fixes | fix(#123): infinite spinner when clicking contacts tab twice | patch |
Performance | perf(#789): lazily loaded angular modules | patch |
Features | feat(#456): add home tab | minor |
Non-code | chore(#123): update README | none |
Breaking | perf(#2): remove reporting rates feature BREAKING CHANGE: reporting rates no longer supported |
major |
When testing updates to this repository, it is useful to be able to iterate quickly. The default configuration values set for the metric collection time intervals are intended for production usage and are not suitable for testing/development. You can use the apply-dev-patches
NPM script to apply a set of patches that will lower the interval values to make testing easier.
npm run apply-dev-patches
Just remember that the modified configuration files are being tracked by git, so be sure to revert the patches before committing any changes.
npm run revert-dev-patches
Fundamentally, Prometheus stores metrics as a series of time-stamped numerical values associated with a set of labels. String values or other non-numeric values cannot be tracked in Prometheus. See the Prometheus data model for more details.
For numerical values, Prometheus supports four different metric types.
When adding new metrics, follow the best practices for metric names and labels.
Tips:
- When collecting the same kind of metric for multiple entities, use a single metric name and differentiate the various values using labels.
- For example, the
cht_couchdb_doc_total
metric tracks the total number of documents in a CouchDB database. Values for that metric are recorded with adb
label that specifies which database (e.g.medic
orsentinel
) is associated with the value.
- For example, the
- Use a consistent prefix for metric names. This makes it easier to find related metrics.
- For example, all CHT metrics start with
cht_
.
- For example, all CHT metrics start with
- Prefer collecting "total" values over collecting "time-bound" values. For example, when recording the number of outbound messages, the value stored should just be the total number of messages sent up to that time. Do not store values like "the number of messages sent in the last 7 days". If Prometheus records the changes to the total number of messages over time, then queries can be used to calculate the number of messages sent in the last 7 days (or any other time range).
- In general, configuration for a Prometheus metric should not be modified after it is created. Any changes to an existing metric could affect data that has already been collected for that metric. If a change is needed, create a new metric with the new configuration and deprecate the old metric. This allows for a clean transition from the old metric to the new metric.
Metrics loaded from the CHT /monitoring
endpoint are configured via the json-exporter's config/cht.yml
file. The modules.default.metrics
section contains the configuration for mapping the JSON response from the CHT /monitoring
endpoint to Prometheus metrics. New values added to this JSON can be included in Prometheus by adding additional mapping here. See the json_exporter project on GitHub for more information.
Metrics loaded from a Couch2pg Postgres DB are configured via the sql_exporter's couch2pg_collector.yml
file. New queries can be added by copying couch2pg_collector.yml
into a new file following the *_collector.yml
nomenclature. For example reports-by-type_collector.yml
could be created with a new query to gather all CHW submitted reports. See the sql_exporter project on GitHub for more information.
Configuration for new Prometheus endpoints to scrape can be added in a new directory inside exporters
. In the new directory, create a config
directory for any static config files that the consumer should not edit/reference. In this config
directory, create a scrape_config.yml
file with the Prometheus scrape configuration for the endpoint. Then, add a new docker compose yml
file to your exporter directory. In this file, update the prometheus
service to include a new volumes
entry that maps your scrape_config.yml
file to the Prometheus container in the /etc/prometheus/scrape_configs
directory.
If you are scraping a resource that natively supports returning Prometheus metrics, this should be all the configuration you need. If your resource does not provide Prometheus metrics, itself, you will need to update the docker compose configuration to include a new exporter container that can convert the resource's metrics into Prometheus metrics.
See the existing exporters for examples.
To start the cht-watchdog services with your new exporter configuration, simply use -f
to include your new docker compose file when starting the services.
Consumers of cht-watchdog cannot edit the provisioned Grafana configuration (for dashboards and alerts) directly. This means that we can continue to evolve the configuration without worrying about breaking existing deployments.
The configuration for provisioned dashboards is stored in `grafana/provisioning/dashboards/CHT. Each dashboard is defined in a separate JSON file. The JSON files are generated from the Grafana UI. See the Grafana documentation for more information.
A new dashboard can be added by simply creating a new JSON file in the CHT
directory. (The JSON file can be generated from the Grafana UI and then copied into the CHT
directory.)
Modifications to the existing dashboards can be made directly to the JSON files (if the change is simple) or by using the Grafana UI to make the change and then exporting the updated JSON file.
The configuration for provisioned alert rules is stored in the grafana/provisioning/alerting/cht.yml
file. See the Grafana documentation for more information.
Minor modifications to alert rules can be done directly in the yml file, but any significant additions or modifications to the alert rules should be done in the Grafana UI and then exported via the Alerting provisioning API.
To make significant modifications to an existing alert:
- View the alert in the Grafana Alert Rules UI, and select the "Copy" button. This will prompt you to create a new rule that "will NOT be marked as provisioned". This is what you want to do.
- Make your desired changes to your copied rule and save the rule into a new Evaluation group (the details of the group can be anything).
- View your new rule in the Grafana Alert Rules UI and note the
Rule UID
value. - In the Grafana Alert Rules UI, use the "Export" button to download a
yml
file containing the updated configuration. - Find the
rules
entry with youruid
value and diff that with the existing configuration for your rule in thegrafana/provisioning/alerting/cht.yml
file. Include all the desired changes in thecht.yml
file, but do not change things like theuid
, etc. - Delete the copied rule from the Grafana UI.
Testing Grafana behavior is tricky since it requires data to test. Typically, a Grafana panel/alert will be based on either the latest value from a stream of data or a sequence of historical data values.
The fake-cht
server can be used to simulate the /monitoring
endpoint of a CHT instance. It will also populate Couch2pg-style data in a Postgres instance. The data it returns is random (within certain limits).
Copy the example config files:
cp development/fake-cht/example-config/cht-instances.yml cht-instances.yml
cp development/fake-cht/example-config/sql_servers.yml ./exporters/postgres/.
You will also need to run a few additional commands from the normal setup process to prepare your new instance:
cp grafana/grafana.example.ini grafana/grafana.ini
mkdir -p grafana/data && mkdir -p prometheus/data
From the root directory, run:
docker compose -f docker-compose.yml -f exporters/postgres/compose.yml -f development/fake-cht/docker-compose.fake-cht.yml up -d
The Postgres data will be persisted in a Docker volume. To clear the data when you are finished testing (to allow for a fresh environment on the next run), run your docker compose down
command with the -v
flag to delete the volume.
docker compose -f docker-compose.yml -f exporters/postgres/compose.yml -f development/fake-cht/docker-compose.fake-cht.yml down -v
The following is a manual process that involves creating a test data-set and injecting it into a fresh deployment of Prometheus.
Each test is associated with an xlsx
file in this directory that contains the test data (and a description of the test).
Start a fresh deployment of cht-watchdog without providing any CHT URL and with the test override:
docker compose -f docker-compose.yml -f exporters/postgres/compose.yml -f development/fake-cht/docker-compose.fake-cht.yml up -d
Open the xlsx
file of the test you want to run. Switch to the data
sheet and Save As a csv
file (named data.csv
) in the development
directory.
In your terminal, navigate to the development
directory and run cat data.csv | ./generate_test_data.js > ../prometheus/data/data.txt
. This converts the csv
file to the OpenMetrics format that Prometheus expects and injects it into the Prometheus data volume. (This is a good point to stop and double-check the data.txt
file to make sure it looks correct.)
Run docker compose exec prometheus promtool tsdb create-blocks-from openmetrics /prometheus/data.txt /prometheus && docker compose restart prometheus
to push the data into Prometheus.
Now you can open Grafana and verify that the panel is displaying the expected data or the expected alert has fired.
Remember, that you have to completely destroy the prometheus data volume before running another test that uses the same metric.
To test email alerts, we can use a maildev
server to accept incoming SMTP requests from Grafana.
Update the graphana.ini
file and add the following to the smtp
section:
enabled = true
host = maildev:1025
Start the maildev
server along with the rest of the monitoring stack by running docker compose -f docker-compose.yml -f development/docker-compose.smtp.yml up -d
.
You can view the MailDev UI at http://localhost:1080.