Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics documentation update #171

Merged
merged 8 commits into from
Jun 17, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 70 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@ Development release is by default configured to connect to local APNS / FCM mock
in `config/dev.exs` file.
For now, let's just start those mocks so that we can use default dev configuration:
```bash
docker-compose -f test/docker/docker-compose.unit.yml up -d
docker-compose -f test/docker/docker-compose.mocks.yml up -d
```

After this step you may try to run the service via:
Expand Down Expand Up @@ -491,26 +491,78 @@ If you specify both **alert** and **data**, target device will receive both noti
* **500** `{"reason" : reason}` - the server internal error occured,
specified by **reason**.

### I use MongoosePush docker, where do I find `sys.config`?
### Metrics

If you use dockerized MongoosePush, you need to do the following:
* Start MongoosePush docker, let's assume its name is `mongoose_push`
* Run: `docker cp mongoose_push:/opt/app/var/sys.config sys.config` on you docker host (this will get the current `sys.config` to your `${CWD}`)
* Modify the `sys.config` as you see fit (for metrics, see above)
* Stop MongoosePush docker container and restart it with the modified `sys.config` as volume in `/opt/app/sys.config` (yes, this is not the path we used to copy this file from, this is an override)
MongoosePush 2.1 provides metrics in the Prometheus format on the `/metrics` endpoint.
This is a breaking change compared to previous releases.
Existing dashboards will need to be updated.

It is important to know that metrics are created inside MongoosePush only when a certain event happens.
This may mean that a freshly started MongoosePush node will not have all the possible metrics available yet.

### Available metrics
#### Available metrics

The following metrics are available:
##### Histograms

For more details about the histogram metric type please go to https://prometheus.io/docs/concepts/metric_types/#histogram

###### Notification sent time

`mongoose_push_notification_send_time_microsecond_bucket{error_category=${CATEGORY},error_reason=${REASON},service=${SERVICE},status=${STATUS},le=${LE}}`
`mongoose_push_notification_send_time_microsecond_sum{error_category=${CATEGORY},error_reason=${REASON},service=${SERVICE},status=${STATUS}}`
`mongoose_push_notification_send_time_microsecond_count{error_category=${CATEGORY},error_reason=${REASON},service=${SERVICE},status=${STATUS}}`

* `mongoose_push_apns_state_get_default_topic_count`
* `mongoose_push_notification_send_time_bucket{error_category=${CATEGORY},error_reason=${REASON},service=${SERVICE},status=${STATUS},le=${LENGTH}}`
* `mongoose_push_notification_send_time_sum{error_category=${CATEGORY},error_reason=${REASON},service=${SERVICE},status=${STATUS}}`
* `mongoose_push_notification_send_time_count{error_category=${CATEGORY},error_reason=${REASON},service=${SERVICE},status="${STATUS}}`
Where:
* **CATEGORY** is an arbitrary error category term or empty string
* **REASON** is an arbitrary error reason term or empty string
* **SERVICE** is either `fcm` or `apns`
* **STATUS** is either `success` or `error`
* **LENGTH** is either `100` or `250` or `500` or `1000` or `+Inf`
* `STATUS` is `"success"` for the successful notifications or `"error"` in all other cases
* `SERVICE` is either `"apns"` or `"fcm"`
* `CATEGORY` is an arbitrary error category term (in case of `status="error"`) or an empty string (when `status="success"`)
* `REASON` is an arbitrary error reason term (in case of `status="error"`) or an empty string (when `status="success"`)
* `LE` defines the `upper inclusive bound` (`less than or equal`) values for buckets, currently `1000`, `10_000`, `25_000`, `50_000`, `100_000`, `250_000`, `500_000`, `1000_000` or `+Inf`

> **NOTE**
>
> A bucket of value 250_000 will keep the count of measurements that are less than or equal to 250_000.
> A measurement of value 51_836 will be added to all the buckets where the upper bound is greater than 51_836.
> In this case these are buckets `100_000`, `250_000`, `500_000`, `1000_000` and `+Inf`

This histogram metric shows the distribution of times needed to:
1. Select a worker (this may include waiting time when all workers are busy).
2. Send a request.
3. Get a response from push notifications provider.

##### Counters

* `mongoose_push_supervisor_init_count{service=${SERVICE}}` - Counts the number of push notification service supervisor starts.
The `SERVICE` variable can take `"apns"` or `"fcm"` as a value.
This metric is updated when MongoosePush starts, and later on when the underlying supervision tree is terminated and the error is propagated to the main application supervisor.
* `mongoose_push_apns_state_init_count` - Counts the number of APNS state initialisations.
* `mongoose_push_apns_state_terminate_count` - Counts the number of APNS state terminations.
* `mongoose_push_apns_state_get_default_topic_count` - Counts the number of default topic reads from cache.

#### How to quickly see all metrics

```bash
curl -k https://127.0.0.1:8443/metrics
```

The above command assumes that MongoosePush runs on `localhost` and listens on port `8443`.
Please, mind the `HTTPS` protocol, metrics are hosted on the same port than all the other API endpoints.

#### Prometheus configuration

When configuring Prometheus, it's important to:
* set the `scheme` to `https` since MongoosePush exposes `/metrics` path encrypted endpoint (HTTPS)
* set the `insecure_skip_verify` to `true` if the default self-signed certificates are used

```yaml
scrape_configs:
- job_name: 'mongoose-push'
scheme: 'https' #MongoosePush exposes encrypted endpoint - HTTPS
tls_config: #The default certs used by MongoosePush are self-signed
insecure_skip_verify: true #For checking purposes we can ignore certs verification
static_configs:
- targets: ['mongoose-push:8443']
labels:
group: 'production'

```
4 changes: 3 additions & 1 deletion lib/mongoose_push/application.ex
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,10 @@ defmodule MongoosePush.Application do
_ = check_runtime_configuration_status()

# Define workers and child supervisors to be supervised
# The MongoosePush.Metrics.TelemetryMetrics child is started first to capture possible events
# when services start
children =
service_children() ++ [MongoosePushWeb.Endpoint, MongoosePush.Metrics.TelemetryMetrics]
[MongoosePush.Metrics.TelemetryMetrics] ++ service_children() ++ [MongoosePushWeb.Endpoint]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch 🙂


# See http://elixir-lang.org/docs/stable/elixir/Supervisor.html
# for other strategies and supported options
Expand Down
20 changes: 15 additions & 5 deletions lib/mongoose_push/metrics/telemetry_metrics.ex
Original file line number Diff line number Diff line change
Expand Up @@ -15,17 +15,27 @@ defmodule MongoosePush.Metrics.TelemetryMetrics do
event_name: [:mongoose_push, :notification, :send],
measurement: :time,
buckets: [1000, 10_000, 25_000, 50_000, 100_000, 250_000, 500_000, 1000_000],
tags: [:status, :service, :error_category, :error_reason]
tags: [:status, :service, :error_category, :error_reason],
description:
"A histogram showing push notification send times. Includes worker selection (with possible waiting if all are busy)"
),

# measurement is ignored in Counter metric
Telemetry.Metrics.counter("mongoose_push.supervisor.init.count", tags: [:service]),
Telemetry.Metrics.counter("mongoose_push.apns.state.init.count"),
Telemetry.Metrics.counter("mongoose_push.supervisor.init.count",
tags: [:service],
description: "Counts the number of push notification service supervisor starts"
),
Telemetry.Metrics.counter("mongoose_push.apns.state.init.count",
description: "Counts the number of APNS state initialisations"
),
Telemetry.Metrics.counter("mongoose_push.apns.state.terminate.count",
tags: [:error_reason],
tag_values: fn metadata -> %{metadata | error_reason: metadata.reason} end
tag_values: fn metadata -> %{metadata | error_reason: metadata.reason} end,
description: "Counts the number of APNS state terminations"
),
Telemetry.Metrics.counter("mongoose_push.apns.state.get_default_topic.count")
Telemetry.Metrics.counter("mongoose_push.apns.state.get_default_topic.count",
description: "Counts the number of APNS default topic reads from the ETS cache"
)
]
end
end
2 changes: 1 addition & 1 deletion test/docker/docker-compose.mpush.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# This file needs to be used along with `docker-compose.mocks.yml`:
# docker-compose -f test/docker/docker-compose.mocks.yml -f test/docker/docker-compose.mpush.yml ...
# PRIV=priv docker-compose -f test/docker/docker-compose.mocks.yml -f test/docker/docker-compose.mpush.yml ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docker-compose-mpush.yml requires the PRIV env var. It's set for us when we do mix text.env.up but when running from console we need to export it manually.

version: '3'

services:
Expand Down
12 changes: 12 additions & 0 deletions test/docker/docker-compose.prometheus.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# This file needs to be used along with `docker-compose.mocks.yml` and `docker-compose.mpush.yml:
# PRIV=priv docker-compose -f test/docker/docker-compose.mocks.yml -f test/docker/docker-compose.mpush.yml -f test/docker/docker-compose.prometheus.yml ...
version: '3'

services:
prometheus:
image: prom/prometheus
container_name: mongoose-push-prometheus
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
9 changes: 9 additions & 0 deletions test/docker/prometheus.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
scrape_configs:
- job_name: 'mongoose-push'
scheme: 'https' #MongoosePush exposes encrypted endpoint - HTTPS
tls_config: #The default certs used by MongoosePush are self-signed
insecure_skip_verify: true #For checking purposes we can ignore certs verification
static_configs:
- targets: ['mongoose-push:8443']
labels:
group: 'production'