diff --git a/examples/logging-prometheus/README.md b/examples/logging-prometheus/README.md new file mode 100644 index 0000000000..40a82171a6 --- /dev/null +++ b/examples/logging-prometheus/README.md @@ -0,0 +1,230 @@ +# DeepSparse + Prometheus/Grafana + +This is a simple example that shows you how to connect DeepSparse Logging to the Prometheus/Grafana stack. + +#### There are four steps: +- Configure DeepSparse Logging to log metrics in Prometheus format to a REST endpoint +- Point Prometheus to the appropriate endpoint to scrape the data at a specified interval +- Run the client script simulating a data quality/drift issue +- Visualize data in Prometheus with dashboarding tool like Grafana + +## 0. Setting Up +#### Installation + +To run this tutorial, you need Docker, Docker Compose, and DeepSparse Server +- [Docker Installation](https://docs.docker.com/engine/install/) +- [Docker Compose Installation](https://docs.docker.com/compose/install/) +- DeepSparse Server is installed via PyPi (`pip install deepsparse[server]`) + +#### Code +The repository contains all the code you need: + +```bash +. +├── client +│   ├── client.py # simple client application for interacting with Server +│   └── goldfish.jpeg # photo of a goldfish +| └── all_black.jpeg # photo with just black pixels +├── server-config.yaml # specifies the configuration of the DeepSparse server +├── custom-fn.py # custom function used for the logging +├── docker # specifies the configuration of the containerized Prometheus/Grafana stack +│   ├── docker-compose.yaml +│   └── prometheus.yaml +└── grafana # specifies the design of the Grafana dashboard + └── dashboard.json +``` +## 1. Spin up the DeepSparse Server + +`server-config.yaml` specifies the config of the DeepSparse Server, including for logging: + +```yaml +# server-config.yaml + +loggers: + prometheus: # logs to prometheus on port 6100 + port: 6100 + +endpoints: + - task: image_classification + route: /image_classification/predict + model: zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95_quant-none + data_logging: + pipeline_inputs.images[0]: # applies to the first image (of the form target.property[idx]) + - func: fraction_zeros # built-in function + frequency: 1 + target_loggers: + - prometheus + - func: custom-fn.py:mean_pixel_red # custom function + frequency: 1 + target_loggers: + - prometheus +``` + +The config file instructs the server to create an image classification pipeline. Prometheus logs are declared to be exposed on port `6100`, system logging is turned on, and we will log the mean pixel of the red channel (a custom function) as well as the percentage of pixels that are 0 (a built-in function) for each image sent to the server. + +Thus, once launched, the Server exposes two endpoints: +- port `6100`: exposes the `metrics` endpoint through [Prometheus python client](https://github.com/prometheus/client_python). +- port `5543`: exposes the endpoint for inference. + +To spin up the Server execute: +``` +deepsparse.server --config_file server-config.yaml +``` + +To validate that metrics are being properly exposed, visit `localhost:6100`. It should contain logs in the specific format meant to be used by the PromQL query language. + +## 2. Setup Prometheus/Grafana Stack + +For simplicity, we have provided `docker-compose.yaml` that spins up the containerized Prometheus/Grafana stack. In that file, we instruct `prometheus.yaml` (a [Prometheus config file](https://prometheus.io/docs/prometheus/latest/configuration/configuration/)) to be passed to the Prometheus container. Inside `prometheus.yaml`, the `scrape_config` has the information about the `metrics` endpoint exposed by the server on port `6100`. + +Docker Compose File: + +```yaml +# docker-compose.yaml + +version: "3" + +services: + prometheus: + image: prom/prometheus + extra_hosts: + - "host.docker.internal:host-gateway" # allow a direct connection from container to the local machine + ports: + - "9090:9090" # the default port used by Prometheus + volumes: + - ${PWD}/prometheus.yaml:/etc/prometheus/prometheus.yml # mount Prometheus config file + + grafana: + image: grafana/grafana:latest + depends_on: + - prometheus + ports: + - "3000:3000" # the default port used by Grafana + +``` + +Prometheus Config file: + +```yaml +# prometheus.yaml + +global: + scrape_interval: 15s # how often to scrape from endpoint + evaluation_interval: 30s # time between each evaluation of Prometheus' alerting rules + +scrape_configs: + - job_name: prometheus_logs # your project name + static_configs: + - targets: + - 'host.docker.internal:6100' # should match the port exposed by the PrometheusLogger in the DeepSparse Server config file +``` + + +To start up a Prometheus stack to monitor the DeepSparse Server, run: + +```bash +cd docker +docker-compose up +``` + +## 3. Launch the Python Client and Run Inference + +`client.py` is a simple client that simulates the behavior of some application. In the example, we have two images: + - `goldfish.jpeg`: sample photo of a Goldfish + - `all-black.jpeg`: a photo that is all black (every pixel is a 0) + +The client sends requests to the Server, initially with "just" the Goldfish. Over time, we start to randomly +send the All Black image to the server with increasing probability. This simulates a data issue in the +pipeline that we can detect with the monitoring system. + +Run the following to start inference: + +```bash +python client/client.py +``` + +It prints out which image was sent to the server. + +## 4. Inspecting the Prometheus/Grafana Stack + +### Prometheus + +#### Confirm It Is Working + +Visiting `http://localhost:9090/targets` should show that an endpoint `http://host.docker.internal:6100/metrics` is in state `UP`. + +#### Query Prometheus with PromQL + +If you do not want to use Grafana, you can start off by using Prometheus's native graphing functionality. + +Navigate to `http://localhost:9090/graph` and add the following `Expression`: + +``` +rate(image_classification__0__pipeline_inputs__images__fraction_zeros_sum[30s]) +/ +rate(image_classification__0__pipeline_inputs__images__fraction_zeros_count[30s]) +``` + +You should see the following: + +![prometheus-dashboard.png](image/prometheus-dashboard.png) + +This graph shows the percentage of 0 pixels in the images sent to the server. +As the "corrupted" all black images were sent to the server in increasing probability, +we can clearly see a spike in the graph, alerting us +that something strange is happening with the provided input. + +DeepSparse Server also automatically logs prediction latencies for each pipeline stage as well +end-to-end server-side inference time. Add the following query to inspect average latency: + +``` +rate(image_classification__0__prediction_latency__total_inference_sum[30s]) +/ +rate(image_classification__0__prediction_latency__total_inference_count[30s]) +``` + +![prometheus-dashboard-latency.png](image/prometheus-dashboard-latency.png) + +For more details on working with the Prometheus Query Language PromQL, +see [the official docs](https://prometheus.io/docs/prometheus/latest/querying/basics/). + +### Grafana + +#### Login + +Visit `localhost:3000` to launch Grafana. Log in with the default username (`admin`) and password (`admin`). + +#### Add Prometheus Data Source +Setup the Prometheus data source (`Add your first data source` -> `Prometheus`). On this page, we just +need to update the `url` section. Since Grafana and Prometheus are running separate docker containers, +put we need to put the IP address of the Prometheus container. + +Run the following to lookup the `name` of your Prometheus container: +``` +docker container ls +>>> CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES +>>> 997521854d84 grafana/grafana:latest "/run.sh" About an hour ago Up About an hour 0.0.0.0:3000->3000/tcp, :::3000->3000/tcp docker_grafana_1 +>>> c611c80ae05e prom/prometheus "/bin/prometheus --c…" About an hour ago Up About an hour 0.0.0.0:9090->9090/tcp, :::9090->9090/tcp docker_prometheus_1 +``` + +Run the following to lookup the IP address (replace `docker_prometheus_1` with your container's name): +``` +docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' docker_prometheus_1 +>>> 172.18.0.2 +``` + +So, in our case, the `url` section should be: `http://172.18.0.2:9090`. + +Click `Save & Test`. We should get a green check saying "Data Source Is Working". + +#### Import A Dashboard + +Now you should be ready to create/import your dashboard. + +Grafana's interface for adding metrics is very intuitive (and you can use PromQL), +but we have provided a simple pre-made dashboard for this use case. + +Click `Dashboard` -> `Import` on the left-hand side bar. You should see an option to upload a file. +Upload `grafana/dashboard.json` and save. Then, you should see the following dashboard: + +![img.png](image/grafana-dashboard.png) diff --git a/examples/logging-prometheus/client/all_black.jpeg b/examples/logging-prometheus/client/all_black.jpeg new file mode 100644 index 0000000000..c045c5945c Binary files /dev/null and b/examples/logging-prometheus/client/all_black.jpeg differ diff --git a/examples/logging-prometheus/client/client.py b/examples/logging-prometheus/client/client.py new file mode 100644 index 0000000000..10213f74a8 --- /dev/null +++ b/examples/logging-prometheus/client/client.py @@ -0,0 +1,43 @@ +import random, time, requests, argparse + +parser = argparse.ArgumentParser() +parser.add_argument("--url", type=str, default="http://0.0.0.0:5543/image_classification/predict/from_files") +parser.add_argument("--img1_path", type=str, default="client/goldfish.jpeg") +parser.add_argument("--img2_path", type=str, default="client/all_black.jpeg") +parser.add_argument("--num_iters", type=int, default=25) +parser.add_argument("--prob_incr", type=float, default=0.1) + +def send_random_img(url, img1_path, img2_path, prob_img2): + img_path = "" + if random.uniform(0, 1) < prob_img2: + img_path = img2_path + else: + img_path = img1_path + + files = [('request', open(img_path, 'rb'))] + resp = requests.post(url=url, files=files) + print(f"Sent File: {img_path}") + +def main(url, img1_path, img2_path, num_iters, prob_incr): + prob_img2 = 0.0 + iters = 0 + increasing = True + + while (increasing or prob_img2 > 0.0): + send_random_img(url, img1_path, img2_path, prob_img2) + + if iters % num_iters == 0 and increasing: + prob_img2 += prob_incr + elif iters % num_iters == 0: + prob_img2 -= prob_incr + iters += 1 + + if prob_img2 >= 1.0: + increasing = False + prob_img2 -= prob_incr + + time.sleep(0.25) + +if __name__ == "__main__": + args = vars(parser.parse_args()) + main(args["url"], args["img1_path"], args["img2_path"], args["num_iters"], args["prob_incr"]) \ No newline at end of file diff --git a/examples/logging-prometheus/client/goldfish.jpeg b/examples/logging-prometheus/client/goldfish.jpeg new file mode 100644 index 0000000000..c4536fbec0 Binary files /dev/null and b/examples/logging-prometheus/client/goldfish.jpeg differ diff --git a/examples/logging-prometheus/custom-fn.py b/examples/logging-prometheus/custom-fn.py new file mode 100644 index 0000000000..6fe1c1bcc3 --- /dev/null +++ b/examples/logging-prometheus/custom-fn.py @@ -0,0 +1,5 @@ +import numpy as np +from typing import List + +def mean_pixel_red(img: np.ndarray): + return np.mean(img[:,:,0]) \ No newline at end of file diff --git a/examples/logging-prometheus/docker/docker-compose.yaml b/examples/logging-prometheus/docker/docker-compose.yaml new file mode 100644 index 0000000000..5b48a336f0 --- /dev/null +++ b/examples/logging-prometheus/docker/docker-compose.yaml @@ -0,0 +1,20 @@ +# docker-compose.yaml + +version: "3" + +services: + prometheus: + image: prom/prometheus + extra_hosts: + - "host.docker.internal:host-gateway" # allow a direct connection from container to the local machine + ports: + - "9090:9090" # the default port used by Prometheus + volumes: + - ${PWD}/prometheus.yaml:/etc/prometheus/prometheus.yml # mount Prometheus config file + + grafana: + image: grafana/grafana:latest + depends_on: + - prometheus + ports: + - "3000:3000" # the default port used by Grafana diff --git a/examples/logging-prometheus/docker/prometheus.yaml b/examples/logging-prometheus/docker/prometheus.yaml new file mode 100644 index 0000000000..111357c783 --- /dev/null +++ b/examples/logging-prometheus/docker/prometheus.yaml @@ -0,0 +1,11 @@ +# prometheus.yaml + +global: + scrape_interval: 15s # how often to scrape from endpoint + evaluation_interval: 30s # time between each evaluation of Prometheus' alerting rules + +scrape_configs: + - job_name: deepsparse_img_classification # your project name + static_configs: + - targets: + - 'host.docker.internal:6100' # should match the port exposed by the PrometheusLogger in the DeepSparse Server config file \ No newline at end of file diff --git a/examples/logging-prometheus/grafana/dashboard.json b/examples/logging-prometheus/grafana/dashboard.json new file mode 100644 index 0000000000..2374815e1b --- /dev/null +++ b/examples/logging-prometheus/grafana/dashboard.json @@ -0,0 +1,360 @@ +{ + "annotations": { + "list": [ + { + "builtIn": 1, + "datasource": { + "type": "grafana", + "uid": "-- Grafana --" + }, + "enable": true, + "hide": true, + "iconColor": "rgba(0, 211, 255, 1)", + "name": "Annotations & Alerts", + "target": { + "limit": 100, + "matchAny": false, + "tags": [], + "type": "dashboard" + }, + "type": "dashboard" + } + ] + }, + "editable": true, + "fiscalYearStartMonth": 0, + "graphTooltip": 0, + "id": 1, + "links": [], + "liveNow": false, + "panels": [ + { + "datasource": { + "type": "prometheus", + "uid": "9BtxpaTVz" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + }, + { + "color": "red", + "value": 80 + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 0 + }, + "id": 6, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "9BtxpaTVz" + }, + "editorMode": "code", + "expr": "rate(image_classification__0__prediction_latency__total_inference_sum[30s])\n/\nrate(image_classification__0__prediction_latency__total_inference_count[30s])", + "legendFormat": "Total Time", + "range": true, + "refId": "Total Time" + }, + { + "datasource": { + "type": "prometheus", + "uid": "9BtxpaTVz" + }, + "editorMode": "code", + "expr": "rate(image_classification__0__prediction_latency__engine_forward_sum[30s])\n/\nrate(image_classification__0__prediction_latency__engine_forward_count[30s])", + "hide": false, + "legendFormat": "Engine Time", + "range": true, + "refId": "Engine Time" + }, + { + "datasource": { + "type": "prometheus", + "uid": "9BtxpaTVz" + }, + "editorMode": "code", + "expr": "rate(image_classification__0__prediction_latency__post_process_sum[30s])\n/\nrate(image_classification__0__prediction_latency__post_process_count[30s])", + "hide": false, + "legendFormat": "Post-Process Time", + "range": true, + "refId": "Post-Process Time" + }, + { + "datasource": { + "type": "prometheus", + "uid": "9BtxpaTVz" + }, + "editorMode": "code", + "expr": "rate(image_classification__0__prediction_latency__pre_process_sum[30s])\n/\nrate(image_classification__0__prediction_latency__pre_process_count[30s])", + "hide": false, + "legendFormat": "Pre-Process Time", + "range": true, + "refId": "Pre-Process Time" + } + ], + "title": "Latency", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "9BtxpaTVz" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + }, + { + "color": "red", + "value": 80 + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 8 + }, + "id": 4, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "ZFVT-gK4z" + }, + "editorMode": "code", + "expr": "rate(image_classification__0__pipeline_inputs__images__fraction_zeros_sum[30s])\n/\nrate(image_classification__0__pipeline_inputs__images__fraction_zeros_count[30s])", + "legendFormat": "__auto", + "range": true, + "refId": "A" + } + ], + "title": "Percentage of 0 Pixels", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "9BtxpaTVz" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + }, + { + "color": "red", + "value": 80 + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 9, + "w": 12, + "x": 0, + "y": 16 + }, + "id": 2, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "ZFVT-gK4z" + }, + "editorMode": "code", + "expr": "rate(image_classification__0__pipeline_inputs__images__mean_pixel_red_sum[30s])\n/\nrate(image_classification__0__pipeline_inputs__images__mean_pixel_red_count[30s])", + "legendFormat": "__auto", + "range": true, + "refId": "A" + } + ], + "title": "Mean Pixel Value Red Channel", + "type": "timeseries" + } + ], + "schemaVersion": 37, + "style": "dark", + "tags": [], + "templating": { + "list": [] + }, + "time": { + "from": "now-5m", + "to": "now" + }, + "timepicker": {}, + "timezone": "", + "title": "New dashboard", + "uid": "KVWYTaTVz", + "version": 1, + "weekStart": "" +} \ No newline at end of file diff --git a/examples/logging-prometheus/image/grafana-dashboard.png b/examples/logging-prometheus/image/grafana-dashboard.png new file mode 100644 index 0000000000..589b4e9b6c Binary files /dev/null and b/examples/logging-prometheus/image/grafana-dashboard.png differ diff --git a/examples/logging-prometheus/image/prometheus-dashboard-latency.png b/examples/logging-prometheus/image/prometheus-dashboard-latency.png new file mode 100644 index 0000000000..073794a940 Binary files /dev/null and b/examples/logging-prometheus/image/prometheus-dashboard-latency.png differ diff --git a/examples/logging-prometheus/image/prometheus-dashboard.png b/examples/logging-prometheus/image/prometheus-dashboard.png new file mode 100644 index 0000000000..7168f63cb6 Binary files /dev/null and b/examples/logging-prometheus/image/prometheus-dashboard.png differ diff --git a/examples/logging-prometheus/server-config.yaml b/examples/logging-prometheus/server-config.yaml new file mode 100644 index 0000000000..cd11fe7cb2 --- /dev/null +++ b/examples/logging-prometheus/server-config.yaml @@ -0,0 +1,18 @@ +loggers: + prometheus: # logs to prometheus on port 6100 + port: 6100 + +endpoints: + - task: image_classification + route: /image_classification/predict + model: zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95_quant-none + data_logging: + pipeline_inputs.images[0]: # applies to the first image (of the form stage.property[idx]) + - func: fraction_zeros # built-in function + frequency: 1 + target_loggers: + - prometheus + - func: custom-fn.py:mean_pixel_red # custom function + frequency: 1 + target_loggers: + - prometheus \ No newline at end of file