Skip to content

Commit

Permalink
Merge pull request #171 from esl/mp/metrics-doc-update
Browse files Browse the repository at this point in the history
This PR is primarily about metrics documentation update. It also contains the following changes which may be extracted to separate PR based on review comments.

* Start the MongoosePush.Metrics.TelemetryMetrics child befor others to capture events at startup
* Short metrics documentation in the code
* Add example Prometheus config for MongoosePush and docker-compose file to help with a basic setup.
  • Loading branch information
NelsonVides authored Jun 17, 2020
2 parents bbe76bd + 7e7e451 commit e4caac2
Show file tree
Hide file tree
Showing 6 changed files with 110 additions and 25 deletions.
88 changes: 70 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@ Development release is by default configured to connect to local APNS / FCM mock
in `config/dev.exs` file.
For now, let's just start those mocks so that we can use default dev configuration:
```bash
docker-compose -f test/docker/docker-compose.unit.yml up -d
docker-compose -f test/docker/docker-compose.mocks.yml up -d
```

After this step you may try to run the service via:
Expand Down Expand Up @@ -491,26 +491,78 @@ If you specify both **alert** and **data**, target device will receive both noti
* **500** `{"reason" : reason}` - the server internal error occured,
specified by **reason**.

### I use MongoosePush docker, where do I find `sys.config`?
### Metrics

If you use dockerized MongoosePush, you need to do the following:
* Start MongoosePush docker, let's assume its name is `mongoose_push`
* Run: `docker cp mongoose_push:/opt/app/var/sys.config sys.config` on you docker host (this will get the current `sys.config` to your `${CWD}`)
* Modify the `sys.config` as you see fit (for metrics, see above)
* Stop MongoosePush docker container and restart it with the modified `sys.config` as volume in `/opt/app/sys.config` (yes, this is not the path we used to copy this file from, this is an override)
MongoosePush 2.1 provides metrics in the Prometheus format on the `/metrics` endpoint.
This is a breaking change compared to previous releases.
Existing dashboards will need to be updated.

It is important to know that metrics are created inside MongoosePush only when a certain event happens.
This may mean that a freshly started MongoosePush node will not have all the possible metrics available yet.

### Available metrics
#### Available metrics

The following metrics are available:
##### Histograms

For more details about the histogram metric type please go to https://prometheus.io/docs/concepts/metric_types/#histogram

###### Notification sent time

`mongoose_push_notification_send_time_microsecond_bucket{error_category=${CATEGORY},error_reason=${REASON},service=${SERVICE},status=${STATUS},le=${LE}}`
`mongoose_push_notification_send_time_microsecond_sum{error_category=${CATEGORY},error_reason=${REASON},service=${SERVICE},status=${STATUS}}`
`mongoose_push_notification_send_time_microsecond_count{error_category=${CATEGORY},error_reason=${REASON},service=${SERVICE},status=${STATUS}}`

* `mongoose_push_apns_state_get_default_topic_count`
* `mongoose_push_notification_send_time_bucket{error_category=${CATEGORY},error_reason=${REASON},service=${SERVICE},status=${STATUS},le=${LENGTH}}`
* `mongoose_push_notification_send_time_sum{error_category=${CATEGORY},error_reason=${REASON},service=${SERVICE},status=${STATUS}}`
* `mongoose_push_notification_send_time_count{error_category=${CATEGORY},error_reason=${REASON},service=${SERVICE},status="${STATUS}}`
Where:
* **CATEGORY** is an arbitrary error category term or empty string
* **REASON** is an arbitrary error reason term or empty string
* **SERVICE** is either `fcm` or `apns`
* **STATUS** is either `success` or `error`
* **LENGTH** is either `100` or `250` or `500` or `1000` or `+Inf`
* `STATUS` is `"success"` for the successful notifications or `"error"` in all other cases
* `SERVICE` is either `"apns"` or `"fcm"`
* `CATEGORY` is an arbitrary error category term (in case of `status="error"`) or an empty string (when `status="success"`)
* `REASON` is an arbitrary error reason term (in case of `status="error"`) or an empty string (when `status="success"`)
* `LE` defines the `upper inclusive bound` (`less than or equal`) values for buckets, currently `1000`, `10_000`, `25_000`, `50_000`, `100_000`, `250_000`, `500_000`, `1000_000` or `+Inf`

> **NOTE**
>
> A bucket of value 250_000 will keep the count of measurements that are less than or equal to 250_000.
> A measurement of value 51_836 will be added to all the buckets where the upper bound is greater than 51_836.
> In this case these are buckets `100_000`, `250_000`, `500_000`, `1000_000` and `+Inf`
This histogram metric shows the distribution of times needed to:
1. Select a worker (this may include waiting time when all workers are busy).
2. Send a request.
3. Get a response from push notifications provider.

##### Counters

* `mongoose_push_supervisor_init_count{service=${SERVICE}}` - Counts the number of push notification service supervisor starts.
The `SERVICE` variable can take `"apns"` or `"fcm"` as a value.
This metric is updated when MongoosePush starts, and later on when the underlying supervision tree is terminated and the error is propagated to the main application supervisor.
* `mongoose_push_apns_state_init_count` - Counts the number of APNS state initialisations.
* `mongoose_push_apns_state_terminate_count` - Counts the number of APNS state terminations.
* `mongoose_push_apns_state_get_default_topic_count` - Counts the number of default topic reads from cache.

#### How to quickly see all metrics

```bash
curl -k https://127.0.0.1:8443/metrics
```

The above command assumes that MongoosePush runs on `localhost` and listens on port `8443`.
Please, mind the `HTTPS` protocol, metrics are hosted on the same port than all the other API endpoints.

#### Prometheus configuration

When configuring Prometheus, it's important to:
* set the `scheme` to `https` since MongoosePush exposes `/metrics` path encrypted endpoint (HTTPS)
* set the `insecure_skip_verify` to `true` if the default self-signed certificates are used

```yaml
scrape_configs:
- job_name: 'mongoose-push'
scheme: 'https' #MongoosePush exposes encrypted endpoint - HTTPS
tls_config: #The default certs used by MongoosePush are self-signed
insecure_skip_verify: true #For checking purposes we can ignore certs verification
static_configs:
- targets: ['mongoose-push:8443']
labels:
group: 'production'

```
4 changes: 3 additions & 1 deletion lib/mongoose_push/application.ex
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,10 @@ defmodule MongoosePush.Application do
_ = check_runtime_configuration_status()

# Define workers and child supervisors to be supervised
# The MongoosePush.Metrics.TelemetryMetrics child is started first to capture possible events
# when services start
children =
service_children() ++ [MongoosePushWeb.Endpoint, MongoosePush.Metrics.TelemetryMetrics]
[MongoosePush.Metrics.TelemetryMetrics] ++ service_children() ++ [MongoosePushWeb.Endpoint]

# See http://elixir-lang.org/docs/stable/elixir/Supervisor.html
# for other strategies and supported options
Expand Down
20 changes: 15 additions & 5 deletions lib/mongoose_push/metrics/telemetry_metrics.ex
Original file line number Diff line number Diff line change
Expand Up @@ -15,17 +15,27 @@ defmodule MongoosePush.Metrics.TelemetryMetrics do
event_name: [:mongoose_push, :notification, :send],
measurement: :time,
buckets: [1000, 10_000, 25_000, 50_000, 100_000, 250_000, 500_000, 1000_000],
tags: [:status, :service, :error_category, :error_reason]
tags: [:status, :service, :error_category, :error_reason],
description:
"A histogram showing push notification send times. Includes worker selection (with possible waiting if all are busy)"
),

# measurement is ignored in Counter metric
Telemetry.Metrics.counter("mongoose_push.supervisor.init.count", tags: [:service]),
Telemetry.Metrics.counter("mongoose_push.apns.state.init.count"),
Telemetry.Metrics.counter("mongoose_push.supervisor.init.count",
tags: [:service],
description: "Counts the number of push notification service supervisor starts"
),
Telemetry.Metrics.counter("mongoose_push.apns.state.init.count",
description: "Counts the number of APNS state initialisations"
),
Telemetry.Metrics.counter("mongoose_push.apns.state.terminate.count",
tags: [:error_reason],
tag_values: fn metadata -> %{metadata | error_reason: metadata.reason} end
tag_values: fn metadata -> %{metadata | error_reason: metadata.reason} end,
description: "Counts the number of APNS state terminations"
),
Telemetry.Metrics.counter("mongoose_push.apns.state.get_default_topic.count")
Telemetry.Metrics.counter("mongoose_push.apns.state.get_default_topic.count",
description: "Counts the number of APNS default topic reads from the ETS cache"
)
]
end
end
2 changes: 1 addition & 1 deletion test/docker/docker-compose.mpush.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# This file needs to be used along with `docker-compose.mocks.yml`:
# docker-compose -f test/docker/docker-compose.mocks.yml -f test/docker/docker-compose.mpush.yml ...
# PRIV=priv docker-compose -f test/docker/docker-compose.mocks.yml -f test/docker/docker-compose.mpush.yml ...
version: '3'

services:
Expand Down
12 changes: 12 additions & 0 deletions test/docker/docker-compose.prometheus.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# This file needs to be used along with `docker-compose.mocks.yml` and `docker-compose.mpush.yml:
# PRIV=priv docker-compose -f test/docker/docker-compose.mocks.yml -f test/docker/docker-compose.mpush.yml -f test/docker/docker-compose.prometheus.yml ...
version: '3'

services:
prometheus:
image: prom/prometheus
container_name: mongoose-push-prometheus
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
9 changes: 9 additions & 0 deletions test/docker/prometheus.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
scrape_configs:
- job_name: 'mongoose-push'
scheme: 'https' #MongoosePush exposes encrypted endpoint - HTTPS
tls_config: #The default certs used by MongoosePush are self-signed
insecure_skip_verify: true #For checking purposes we can ignore certs verification
static_configs:
- targets: ['mongoose-push:8443']
labels:
group: 'production'

0 comments on commit e4caac2

Please sign in to comment.