WIP: First iteration of a prometheus exporter for ara #483

dmsimard · 2023-02-22T03:35:03Z

As discussed on the issue for this topic: #177

It's not finished and still very much a WIP but I figured it might be worthwhile to iterate under a branch in a PR instead of the gist: https://gist.github.com/dmsimard/68c149eea34dbff325c9e4e9c39980a0

If prometheus_client is installed, there will be an ara prometheus command to expose prometheus metrics gathered and parsed from an ara instance:

usage: ara prometheus [-h] [--client <client>] [--server <url>] [--timeout <seconds>] [--username <username>] [--password <password>] [--ssl-cert <path/to/certificate>] [--ssl-key <path/to/key>] [--ssl-ca <path/to/cacert>] [--insecure]
                      [--playbook-limit PLAYBOOK_LIMIT] [--task-limit TASK_LIMIT] [--host-limit HOST_LIMIT] [--poll-frequency POLL_FREQUENCY] [--prometheus-port PROMETHEUS_PORT]

Exposes a prometheus exporter to provide metrics from an instance of ara

options:
  -h, --help            show this help message and exit
  --client <client>
                        API client to use, defaults to ARA_API_CLIENT or 'offline'
  --server <url>
                        API server endpoint if using http client, defaults to ARA_API_SERVER or 'http://127.0.0.1:8000'
  --timeout <seconds>
                        Timeout for requests to API server, defaults to ARA_API_TIMEOUT or 30
  --username <username>
                        API server username for authentication, defaults to ARA_API_USERNAME or None
  --password <password>
                        API server password for authentication, defaults to ARA_API_PASSWORD or None
  --ssl-cert <path/to/certificate>
                        If a client certificate is required, the path to the certificate to use, defaults to ARA_API_CERT or None
  --ssl-key <path/to/key>
                        If a client certificate is required, the path to the private key to use, defaults to ARA_API_KEY or None
  --ssl-ca <path/to/cacert>
                        Path to a certificate authority for trusting the API server certificate, defaults to ARA_API_CA or None
  --insecure            Ignore SSL certificate validation, defaults to ARA_API_INSECURE or False
  --playbook-limit PLAYBOOK_LIMIT
                        Max number of playbooks to request at once (default: 1000)
  --task-limit TASK_LIMIT
                        Max number of tasks to request at once (default: 2500)
  --host-limit HOST_LIMIT
                        Max number of hosts to request at once (default: 2500)
  --poll-frequency POLL_FREQUENCY
                        Seconds to wait until querying ara for new metrics (default: 60)
  --prometheus-port PROMETHEUS_PORT
                        Port on which the prometheus exporter will listen (default: 8001)

Heavily a work in progress and learning experience over which we will iterate a number of times. The intent is to make a prometheus exporter gather metrics from an ara instance and expose them so that prometheus can scrape them.

- Added support for querying results through pagination - Added support for paginating through pages of results - Query everything at boot via result limit (i.e, ?limit=1000) and pagination - Store the latest object timestamp such that next scrape will only pick up objects created after that using ?created_after=<timestamp>

- Move it under our existing ara CLI so it can re-use all the boilerplate about instanciating an API client with all the settings - Add args for limits, poll frequency and port for the exporter to listen on

softwarefactory-project-zuul · 2023-02-22T03:48:51Z

Build failed.
https://ansible.softwarefactory-project.io/zuul/buildset/f9d8f487b49d447d8f37dc2007613d34

✔️ ara-tox-py3 SUCCESS in 4m 09s
❌ ara-tox-linters FAILURE in 3m 32s
✔️ ara-basic-ansible-core-devel SUCCESS in 5m 33s (non-voting)
✔️ ara-basic-ansible-6 SUCCESS in 5m 09s
✔️ ara-basic-ansible-core-2.14 SUCCESS in 5m 35s
✔️ ara-basic-ansible-core-2.13 SUCCESS in 5m 03s
✔️ ara-basic-ansible-core-2.12 SUCCESS in 5m 04s
✔️ ara-basic-ansible-core-2.11 SUCCESS in 5m 20s
✔️ ara-basic-ansible-2.9 SUCCESS in 5m 08s
✔️ ara-container-images SUCCESS in 11m 19s

- Added --max-days to limit backfill at boot - Added a bit of verbosity - Adjust hosts to be scanned before tasks (there are way, way more tasks than hosts in terms of volume)

- First try at a playbook histogram containing the timestamp and duration

dmsimard · 2023-02-24T04:20:22Z

I've added a bit more context in the issue (#177 (comment)) and got two quick iterations in:

Added --max-days to limit backfill at boot
Added a bit of verbosity
Adjust hosts to be scanned before tasks (there are way, way more tasks than hosts in terms of volume)
First try at a playbook histogram containing the timestamp and duration

Edit: I've put up an example /metrics response from a single playbook's metric as an histogram in the gist: https://gist.github.com/dmsimard/68c149eea34dbff325c9e4e9c39980a0#file-playbooks_as_histogram-txt

It wants to group metrics based on their label uniqueness, I suppose in our case we want each playbook to be represented individually so we should include their id ? More on that later.

softwarefactory-project-zuul · 2023-02-24T04:31:19Z

Build failed.
https://ansible.softwarefactory-project.io/zuul/buildset/d069974d12c14515aded43c6df617003

✔️ ara-tox-py3 SUCCESS in 3m 24s
❌ ara-tox-linters FAILURE in 3m 15s
✔️ ara-basic-ansible-core-devel SUCCESS in 5m 50s (non-voting)
✔️ ara-basic-ansible-6 SUCCESS in 5m 09s
✔️ ara-basic-ansible-core-2.14 SUCCESS in 5m 26s
✔️ ara-basic-ansible-core-2.13 SUCCESS in 5m 15s
✔️ ara-basic-ansible-core-2.12 SUCCESS in 5m 16s
✔️ ara-basic-ansible-core-2.11 SUCCESS in 6m 29s
✔️ ara-basic-ansible-2.9 SUCCESS in 5m 28s
✔️ ara-container-images SUCCESS in 11m 56s

Still heavily a work in progress but getting a better undertanding of how things work. Host and Tasks have now have gauges by status. Disable playbook metrics temporarily until we revisit it with newfound knowledge.

dmsimard · 2023-06-19T00:18:19Z

I think my brain is starting to understand what is happening.

I've temporarily commented out the current iteration of the playbook metrics until I revisit it with newfound knowledge.

This latest iteration re-works the host and tasks metrics to have gauges per status such that we are able to do graphs like this, for example:

Prometheus task results in grafana

Prometheus host results in grafana

A snippet of what this looks like when querying the prometheus exporter:

# HELP ara_tasks_total Number of tasks recorded by ara in prometheus
# TYPE ara_tasks_total gauge
ara_tasks_total 403.0
# HELP ara_tasks_range Limit metric collection to the N most recent tasks
# TYPE ara_tasks_range gauge
ara_tasks_range 2500.0
# HELP ara_tasks_completed Completed Ansible tasks
# TYPE ara_tasks_completed gauge
ara_tasks_completed{action="command",duration="00:00:00.294820",name="Echo the �abc binary string",path="/home/dmsimard/dev/git/ansible-community/ara/tests/integration/smoke.yaml",playbook="30",results="1",status="completed",updated="2023-06-08T02:43:29.665787Z"} 1.0
ara_tasks_completed{action="debug",duration="00:00:00.155210",name="Task with non-ascii characters - ä, ö, ü",path="/home/dmsimard/dev/git/ansible-community/ara/tests/integration/smoke.yaml",playbook="30",results="1",status="completed",updated="2023-06-08T02:43:29.317583Z"} 1.0
ara_tasks_completed{action="gather_facts",duration="00:00:01.035601",name="Gathering Facts",path="/home/dmsimard/dev/git/ansible-community/ara/tests/integration/smoke.yaml",playbook="30",results="1",status="completed",updated="2023-06-08T02:43:29.098823Z"} 1.0
# HELP ara_tasks_failed Failed Ansible tasks
# TYPE ara_tasks_failed gauge
ara_tasks_failed{action="command",duration="00:00:00.455411",name="smoke-tests : Return false",path="/home/dmsimard/dev/git/ansible-community/ara/tests/integration/roles/smoke-tests/tasks/test-ops.yaml",playbook="30",results="1",status="failed",updated="2023-06-08T02:43:25.190901Z"} 1.0
ara_tasks_failed{action="fail",duration="00:00:00.210469",name="fail",path="/home/dmsimard/dev/git/ansible-community/ara/tests/integration/failed.yaml",playbook="29",results="1",status="failed",updated="2023-06-08T02:43:07.648379Z"} 1.0
ara_tasks_failed{action="fail",duration="00:00:00.219566",name="Generate a failure that will be rescued",path="/home/dmsimard/dev/git/ansible-community/ara/tests/integration/lookups.yaml",playbook="26",results="1",status="failed",updated="2023-06-08T02:32:51.180755Z"} 1.0
# ...

# HELP ara_hosts_total Hosts recorded by ara
# TYPE ara_hosts_total gauge
ara_hosts_total 43.0
# HELP ara_hosts_range Limit metric collection to the N most recent hosts
# TYPE ara_hosts_range gauge
ara_hosts_range 2500.0
# HELP ara_hosts_changed Number of changes on a host
# TYPE ara_hosts_changed gauge
ara_hosts_changed{name="localhost",playbook="30",updated="2023-06-08T02:43:29.848077Z"} 10.0
ara_hosts_changed{name="localhost",playbook="28",updated="2023-06-08T02:33:20.625359Z"} 1.0
ara_hosts_changed{name="localhost",playbook="26",updated="2023-06-08T02:32:54.179356Z"} 1.0
# HELP ara_hosts_failed Number of failures on a host
# TYPE ara_hosts_failed gauge
ara_hosts_failed{name="localhost",playbook="29",updated="2023-06-08T02:43:07.767992Z"} 1.0
ara_hosts_failed{name="localhost",playbook="24",updated="2023-06-08T02:32:18.773096Z"} 1.0
ara_hosts_failed{name="localhost",playbook="23",updated="2023-06-08T02:04:04.810142Z"} 1.0
# ...

softwarefactory-project-zuul · 2023-06-19T00:32:17Z

Build failed.
https://ansible.softwarefactory-project.io/zuul/buildset/75ed0374bc6e4344af27503fe6350e60

✔️ ara-tox-py3 SUCCESS in 9m 57s
❌ ara-tox-linters FAILURE in 9m 48s
✔️ ara-basic-ansible-core-devel SUCCESS in 4m 59s (non-voting)
✔️ ara-basic-ansible-6 SUCCESS in 6m 11s
✔️ ara-basic-ansible-core-2.14 SUCCESS in 6m 01s
✔️ ara-basic-ansible-core-2.13 SUCCESS in 10m 57s
✔️ ara-basic-ansible-core-2.12 SUCCESS in 10m 38s
✔️ ara-basic-ansible-core-2.11 SUCCESS in 10m 51s
✔️ ara-basic-ansible-2.9 SUCCESS in 10m 50s
✔️ ara-container-images SUCCESS in 17m 13s

- Add a summary metric for tracking the duration of tasks. This is what was intended when trying to do the playbook histogram so we'll come back to that later.

softwarefactory-project-zuul · 2023-06-19T15:30:56Z

Build failed.
https://ansible.softwarefactory-project.io/zuul/buildset/4c6c9dea87f14d93aa1ec28b71ebc083

✔️ ara-tox-py3 SUCCESS in 4m 14s
❌ ara-tox-linters FAILURE in 3m 12s
✔️ ara-basic-ansible-core-devel SUCCESS in 6m 20s (non-voting)
✔️ ara-basic-ansible-6 SUCCESS in 7m 07s
✔️ ara-basic-ansible-core-2.14 SUCCESS in 8m 02s
✔️ ara-basic-ansible-core-2.13 SUCCESS in 6m 20s
✔️ ara-basic-ansible-core-2.12 SUCCESS in 5m 32s
✔️ ara-basic-ansible-core-2.11 SUCCESS in 6m 17s
✔️ ara-basic-ansible-2.9 SUCCESS in 5m 40s
✔️ ara-container-images SUCCESS in 11m 13s

softwarefactory-project-zuul · 2023-06-20T04:53:04Z

Build succeeded.
https://ansible.softwarefactory-project.io/zuul/buildset/59731f5a132942749960db45ae05a18a

✔️ ara-tox-py3 SUCCESS in 4m 15s
✔️ ara-tox-linters SUCCESS in 3m 57s
✔️ ara-basic-ansible-core-devel SUCCESS in 7m 09s (non-voting)
✔️ ara-basic-ansible-6 SUCCESS in 6m 09s
✔️ ara-basic-ansible-core-2.14 SUCCESS in 6m 24s
✔️ ara-basic-ansible-core-2.13 SUCCESS in 6m 01s
✔️ ara-basic-ansible-core-2.12 SUCCESS in 6m 30s
✔️ ara-basic-ansible-core-2.11 SUCCESS in 6m 08s
✔️ ara-basic-ansible-2.9 SUCCESS in 6m 31s
✔️ ara-container-images SUCCESS in 11m 36s

- Substantial cleanup and cut on code duplication - Fix linting and style - Metric labels moved to default constants, leave the door opened for the possibility of customizing them - Retrofit what we learned back to the playbook metrics - Re-enable playbook metrics

dmsimard · 2023-06-20T05:20:57Z

Lots of cleanup in this last iteration and I've done some tweaking on the grafana dashboard.

It looks like this now:

softwarefactory-project-zuul · 2023-06-20T05:31:22Z

Build failed.
https://ansible.softwarefactory-project.io/zuul/buildset/0eed3702b4444312b85e762bc95e51dc

✔️ ara-tox-py3 SUCCESS in 3m 12s
❌ ara-tox-linters FAILURE in 3m 12s
✔️ ara-basic-ansible-core-devel SUCCESS in 6m 16s (non-voting)
✔️ ara-basic-ansible-6 SUCCESS in 5m 58s
✔️ ara-basic-ansible-core-2.14 SUCCESS in 5m 20s
✔️ ara-basic-ansible-core-2.13 SUCCESS in 6m 54s
✔️ ara-basic-ansible-core-2.12 SUCCESS in 4m 51s
✔️ ara-basic-ansible-core-2.11 SUCCESS in 6m 03s
✔️ ara-basic-ansible-2.9 SUCCESS in 5m 08s
✔️ ara-container-images SUCCESS in 11m 33s

- More cleanup - Removed Gauges for each status of playbooks and tasks, they were not useful once understanding how to use Summaries and generated a lot of needless metrics in hindsight - Added a package extra for [prometheus] - First iteration of docs - Add first iteration of grafana dashboard

softwarefactory-project-zuul · 2023-06-21T04:05:50Z

Build failed.
https://ansible.softwarefactory-project.io/zuul/buildset/fe23cb058a504bc48f68b007b1d4de91

✔️ ara-tox-py3 SUCCESS in 3m 15s
❌ ara-tox-linters FAILURE in 3m 07s
✔️ ara-tox-docs SUCCESS in 7m 57s
✔️ ara-basic-ansible-core-devel SUCCESS in 5m 09s (non-voting)
✔️ ara-basic-ansible-6 SUCCESS in 5m 03s
✔️ ara-basic-ansible-core-2.14 SUCCESS in 11m 10s
✔️ ara-basic-ansible-core-2.13 SUCCESS in 5m 06s
✔️ ara-basic-ansible-core-2.12 SUCCESS in 5m 06s
✔️ ara-basic-ansible-core-2.11 SUCCESS in 4m 45s
✔️ ara-basic-ansible-2.9 SUCCESS in 5m 08s
✔️ ara-container-images SUCCESS in 10m 57s

dmsimard · 2023-06-21T13:32:27Z

I feel this is ready for a first look to a wider audience so I've asked around for testing and feedback:

The final implementation may change before landing (for example if I screwed up in metric types) but this will be useful to make sure we did the right decisions and do the necessary changes before merging.

I am narrowing the scope of this first PR to playbooks, tasks and hosts for now. Results and plays can come in a later patch as necessary.

softwarefactory-project-zuul · 2023-07-21T00:51:29Z

Build failed.
https://ansible.softwarefactory-project.io/zuul/buildset/5332cbba06be4ca09a29ccbfe24bb719

✔️ ara-tox-py3 SUCCESS in 3m 50s
❌ ara-tox-linters FAILURE in 3m 56s
✔️ ara-tox-docs SUCCESS in 3m 58s
✔️ ara-basic-ansible-core-devel SUCCESS in 6m 17s (non-voting)
✔️ ara-basic-ansible-8 SUCCESS in 6m 03s
✔️ ara-basic-ansible-core-2.15 SUCCESS in 6m 53s
✔️ ara-basic-ansible-core-2.14 SUCCESS in 5m 23s
✔️ ara-basic-ansible-2.9 SUCCESS in 6m 06s
✔️ ara-container-images SUCCESS in 12m 00s

softwarefactory-project-zuul · 2023-07-21T03:26:33Z

Build failed.
https://ansible.softwarefactory-project.io/zuul/buildset/51c4f4164d66409bbf48568389543706

✔️ ara-tox-py3 SUCCESS in 3m 49s
❌ ara-tox-linters FAILURE in 3m 53s
✔️ ara-tox-docs SUCCESS in 3m 11s
✔️ ara-basic-ansible-core-devel SUCCESS in 6m 03s (non-voting)
✔️ ara-basic-ansible-8 SUCCESS in 6m 01s
✔️ ara-basic-ansible-core-2.15 SUCCESS in 7m 29s
✔️ ara-basic-ansible-core-2.14 SUCCESS in 7m 20s
✔️ ara-basic-ansible-2.9 SUCCESS in 5m 55s
✔️ ara-container-images SUCCESS in 11m 19s

dmsimard · 2023-10-23T00:14:44Z

Nothing special pushed, just rebased on top of latest master.

softwarefactory-project-zuul · 2023-10-23T00:30:03Z

Build failed.
https://ansible.softwarefactory-project.io/zuul/buildset/7f750024dd7b42b2987983a14fc3a884

✔️ ara-tox-py3 SUCCESS in 4m 05s
❌ ara-tox-linters FAILURE in 3m 50s
✔️ ara-tox-docs SUCCESS in 3m 15s
✔️ ara-basic-ansible-core-devel SUCCESS in 6m 55s (non-voting)
✔️ ara-basic-ansible-8 SUCCESS in 7m 00s
✔️ ara-basic-ansible-core-2.15 SUCCESS in 6m 58s
✔️ ara-basic-ansible-core-2.14 SUCCESS in 6m 21s
✔️ ara-container-images SUCCESS in 13m 52s

dmsimard · 2023-10-26T22:34:01Z

I will eventually include it in the docs but in the meantime, I've come up with the following graph that explains how one might use the exporter:

                                         ┌──────────────────┐
       ┌────────────┐ promql ┌─────────┐ │ ansible-playbook │
       │ Prometheus │◄───────┤ Grafana │ │    (with ara)    │
       └──────┬─────┘        └─────────┘ └───────┬──────────┘
              │                                  │
              │ scrapes /metrics                 │ collects data
              │ & stores results                 │ & sends it
              │                                  │
   ┌──────────▼──────────┐               ┌───────▼────────┐
   │ Prometheus Exporter ├──────────────►│ ara API server │
   │ (prometheus_client) │ query metrics │    (django) ┌──┴─────────┐
   └─────────────────────┘               └─────────────┤ recorded   │
                                                       │  playbooks │
                                                       └────────────┘

dmsimard · 2023-10-26T22:40:36Z

doc/source/prometheus.rst

+
+ara doesn't provide monitoring or alerting out of the box (they are out of scope) but it records a number of granular metrics about Ansible playbooks, tasks and hosts, amongst other things.
+
+Starting with version 1.6.2, ara provides an integration of `prometheus_client <https://github.com/prometheus/client_python>`_ that queries the ara API and then exposes these metrics for prometheus to scrape.


1.6.2 didn't pan out, we went straight to 1.7.0. It can be included in a release as soon as it's ready.

dmsimard · 2023-10-29T18:10:42Z

ara/cli/prometheus.py

+            help='Maximum number of days to backfill metrics for (default: 90)',
+            default=90,
+            type=int
+        )


I think it could be interesting for the exporter to be able to filter queries like the general CLI commands work, for example ara playbook list (docs) has:

--ansible_version <ansible_version> List playbooks that ran with the specified Ansible version (full or partial) --client_version <client_version> List playbooks that were recorded with the specified ara client version (full or partial) --server_version <server_version> List playbooks that were recorded with the specified ara server version (full or partial) --python_version <python_version> List playbooks that were recorded with the specified python version (full or partial) --user <user> List playbooks that were run by the specified user (full or partial) --controller <controller> List playbooks that ran from the provided controller (full or partial) --name <name> List playbooks matching the provided name (full or partial) --path <path> List playbooks matching the provided path (full or partial) --status <status> List playbooks matching a specific status ('completed', 'running', 'failed')

voileux · 2023-11-17T17:52:10Z

Hi,
I was at ansible meetup in OVH building at montreal, your presentation was really good.
In prometheus, it's bad, when value of tag change during polling interval for one metric, it's better to transform the tag into metric.

I think you can transform for example this metric :
ara_tasks_completed{
action="command",
duration="00:00:00.294820",
name="Echo the �abc binary string",
path="/home/dmsimard/dev/git/ansible-community/ara/tests/integration/smoke.yaml",
playbook="30",
results="1",
status="completed",
updated="2023-06-08T02:43:29.665787Z"} 1.0

into several metric,
ara_tasks_status { action="command", name='Echo the abc binary string", path="/home/.......", playbook="30" } 1 (you can map value of integer to status name (1 for completed', 2 for running', 3 for 'failed)

ara_tasks_duration { action="command", name='Echo the abc binary string", path="/home/.......", playbook="30" } number seconds (or micro seconds if needed)

ara_tasks_results { action="command", name='Echo the abc binary string", path="/home/.......", playbook="30" } 1

We can work together to build correct metric, then we will produce correct python for exporter.

dmsimard · 2023-11-18T14:19:04Z

Hi @voileux and thanks for reaching out!

What you suggest makes sense to me and it's worth looking into.

I don't have bandwidth to look into this /right now/ but I will revisit this in the near future.

copolycube · 2023-11-23T13:56:08Z

Hello,

depending on your goal here : it might be easier for you to limit the "exporter part" to what you want to monitor live (i.e. what you want to trigger alerts on)

And for the visualization aspects, directly connect grafana to your database with the specific grafana datasource:

something like :

flowchart TD
    G[Grafana] -->|promql <br/> visualize <b>alerts</b><br/> and correlate current metrics| P(Prometheus )
    G -->|db datasource <br/> visualize <b>metrics</b> <br/>current and historical| D
    W(alertmanager) -->|promql<br/>trigger alerts| P
    P-->|scrapes /metrics<br/> stores data| E(Prometheus Exporter<br/>prometheus_client)
    E --> |query metrics| D(ara API server <br/> django <br/>fa:fa-database recorded playbooks)
    A(ansible playbook) -->|collects data<br/>& sends it| D

instead of (from your previous schema here)

flowchart TD
    G[Grafana] -->|promql| P(Prometheus)
    P-->|scrapes /metrics<br/> stores data| E(Prometheus Exporter<br/>prometheus_client)
    E --> |query metrics| D(ara API server <br/> django <br/>fa:fa-database recorded playbooks)
    A(ansible playbook) -->|collects data<br/>& sends it| D

(edit: I forgot to put the mermaid keyword, and took this opportunity to add alertmanager & clarify the schema equivalent to the one you presented before)

This indeed requires you to rewrite your panels in grafana in order to make use of the proper SQL, and you will need to open the connection between grafana and your DB

Also it avoids to transform the whole content of the DB opentelemetry format and scraping it each time, which will scale better :-D

dmsimard · 2023-12-20T03:14:41Z

Hi, I haven't revisited this in a little while but I wanted to say it was still on my radar and I plan to work on this some more in the near future.

xlr-8 · 2024-10-02T06:41:19Z

Hello @dmsimard,

Thank you for the great project! Really nice to see / use!

I'm interested in taking over the topic if that's alright with you? And also willing to build the dashboard for grafana based on the metric gathered. I'm no expert, but I've used them a bit.

I've created a branch on my repo tried to take into account your suggestions & @voileux 's. However I'm currently stuck on the testing phase.

I've read your documentation / code, but I can't make the prometheus action exposed via CLI.
So far I've already:

Build the container with buildah (why use buildah by the way?)
Exported it to docker to make it run
Started the container
Re-install the ara (using: pip uninstall ara && pip install -e '.[server,prometheus]')

The project runs locally, I still have access to everything as before, but no way to get access to prometheus through the CLI:

# Can see the host for example but nothing for prom
 ara help | grep -E '(host|prometheus)'
  host delete    Deletes the specified host and associated resources
  host list      Returns a list of hosts based on search queries
  host metrics   Provides metrics about hosts
  host show      Returns a detailed view of a specified host

Logs of the previous steps:

> buildah images | grep ara-177
localhost/ara-api        ara-177   fcb72fa860d1   23 hours ago   295 MB
> docker images | grep ara-177
localhost/ara-api                               ara-177                        fcb72fa860d1   23 hours ago    280MB
> docker ps
CONTAINER ID   IMAGE                           COMMAND                  CREATED        STATUS        PORTS                                       NAMES
32dd5a2de5dc   localhost/ara-api:ara-177   "bash -c '/usr/local…"   21 hours ago   Up 21 hours   0.0.0.0:8000->8000/tcp, :::8000->8000/tcp   ara

I feel like this part of the documentation is a bit thin, and having to use/understand buildah/docker/tox (is it needed ?) or the overall parser is difficult to me. My feeling is that there's either some cache that I haven't cleaned and that it still uses some old version (ara 0.0.1.dev991) or that some import of the Prometheus is missing (in setup.cfg? or somewhere else) and therefore never called / reachable.

I'm also willing to help the doc on those part to help other people participate to it - but so far is still too blurry for me to write anything clear.

If you can fill in the blank it would be amazing!

Thanks!

dmsimard · 2024-10-02T19:28:42Z

Hi @xlr-8, thanks for your interest and for looking into this.

I haven't yet revisited this topic but I did talk about it at configuration management camp last year.
It wasn't recorded unfortunately but slides are available here: https://ara.recordsansible.org/presentations/cfg-mgmt-2024/ansible-metrics-in-prometheus.pdf (other presentations: https://ara.recordsansible.org/presentations/)

I am still interested in making this work :)

In the backup slides for last year's presentation there's a condensed how-to for testing this:

Demo: Trying out the exporter

# Install and run a prometheus exporter with metrics from ara
# https://github.com/ansible-community/ara/issues/177
# https://github.com/ansible-community/ara/pull/483
git clone https://github.com/dmsimard/ara
cd ara
git checkout prometheus_exporter

# Set up a virtualenv with ansible, ara and prometheus-client
tox -e ansible-integration --notest
source .tox/ansible-integration/bin/activate
pip install prometheus-client

# Metrics from localhost without needing to run a server
ara prometheus --max-days 1

# Metrics from a remote server running somewhere
ara prometheus --client http --server http://127.0.0.1:8000 --max-days 1

This should help you get started without needing to re-build container images after every change.
In terms of workflow, you basically:

install ara from source (from git branch) including prometheus-client
populate the ara database with data (real playbooks or you can use something like ansible-playbook tests/integration/hosts.yml with ANSIBLE_CALLBACK_PLUGINS=$(python3 -m ara.setup.callback_plugins)
run the exporter
have a prometheus scrape it

You can make changes to the exporter code and re-run it with the ara prometheus command for it to be effective.

The prometheus config supplied in the backup slides:
prometheus.yml:

global:
  scrape_interval: 15s
scrape_configs:
  - job_name: node
    static_configs:
      - targets: ['localhost:9100']

Start a Prometheus container:

{podman,docker} run -d --name prometheus \
  -p 9090:9090 \
  -v prometheus.yml:/etc/prometheus/prometheus.yml \
  quay.io/prom/prometheus

It's probably worthwhile for the branch to be rebased on top of the latest master by now. There hasn't been changes that would impact the prometheus implementation, I don't think, but there's been things like django updates and such.

I can take care of that if you'd like.

Otherwise:

Build the container with buildah (why use buildah by the way?)

Personal preference :)

I can be reached over matrix (or the slack bridge) and maybe IRC for discussion.

xlr-8 · 2024-10-03T10:09:33Z

Awesome! Thank you so much for the detailed answer!

Personal preference :)

Alright, I figured perhaps there was some better integration with RedHat / RedHat like distros, as I could see you were using Fedora/CentOS.

No worries for the rebase, I'll take care of it.

I should take a look at it within the next few days ❤️

dmsimard · 2024-12-15T02:59:43Z

@xlr-8 did you end up spending some cycles on this? It is coming back into my radar in the not-too-distant future.

dmsimard added 3 commits February 21, 2023 20:14

WIP: contrib: A prometheus exporter for ara

98cb485

Heavily a work in progress and learning experience over which we will iterate a number of times. The intent is to make a prometheus exporter gather metrics from an ara instance and expose them so that prometheus can scrape them.

WIP: prometheus_exporter iteration 3

86dfdf8

- Move it under our existing ara CLI so it can re-use all the boilerplate about instanciating an API client with all the settings - Add args for limits, poll frequency and port for the exporter to listen on

dmsimard force-pushed the prometheus_exporter branch from 405c187 to 86dfdf8 Compare February 22, 2023 03:36

dmsimard mentioned this pull request Feb 22, 2023

A prometheus exporter for playbook metrics ? #177

Open

dmsimard added 2 commits February 23, 2023 22:05

WIP: prometheus_exporter iteration 4

7c01a0b

- Added --max-days to limit backfill at boot - Added a bit of verbosity - Adjust hosts to be scanned before tasks (there are way, way more tasks than hosts in terms of volume)

WIP: prometheus_exporter iteration 5

479ce45

- First try at a playbook histogram containing the timestamp and duration

WIP: prometheus_exporter iteration 6

0355d43

Still heavily a work in progress but getting a better undertanding of how things work. Host and Tasks have now have gauges by status. Disable playbook metrics temporarily until we revisit it with newfound knowledge.

WIP: prometheus_exporter iteration 7

2d7cd30

- Add a summary metric for tracking the duration of tasks. This is what was intended when trying to do the playbook histogram so we'll come back to that later.

dmsimard force-pushed the prometheus_exporter branch from feadacf to 7558a6f Compare June 20, 2023 05:18

dmsimard force-pushed the prometheus_exporter branch from b82da8c to 6283872 Compare June 21, 2023 03:41

dmsimard force-pushed the prometheus_exporter branch from 0ce3cf1 to c92b29b Compare July 21, 2023 03:12

dmsimard marked this pull request as draft September 9, 2023 15:49

dmsimard force-pushed the prometheus_exporter branch from c92b29b to 6283872 Compare October 23, 2023 00:14

dmsimard commented Oct 26, 2023

View reviewed changes

dmsimard commented Oct 29, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: First iteration of a prometheus exporter for ara #483

WIP: First iteration of a prometheus exporter for ara #483

dmsimard commented Feb 22, 2023

softwarefactory-project-zuul bot commented Feb 22, 2023

dmsimard commented Feb 24, 2023 •

edited

Loading

softwarefactory-project-zuul bot commented Feb 24, 2023

dmsimard commented Jun 19, 2023

softwarefactory-project-zuul bot commented Jun 19, 2023

softwarefactory-project-zuul bot commented Jun 19, 2023

softwarefactory-project-zuul bot commented Jun 20, 2023

dmsimard commented Jun 20, 2023

softwarefactory-project-zuul bot commented Jun 20, 2023

softwarefactory-project-zuul bot commented Jun 21, 2023

dmsimard commented Jun 21, 2023

softwarefactory-project-zuul bot commented Jul 21, 2023

softwarefactory-project-zuul bot commented Jul 21, 2023

dmsimard commented Oct 23, 2023

softwarefactory-project-zuul bot commented Oct 23, 2023

dmsimard commented Oct 26, 2023

dmsimard Oct 26, 2023

dmsimard Oct 29, 2023

voileux commented Nov 17, 2023

dmsimard commented Nov 18, 2023

copolycube commented Nov 23, 2023 •

edited

Loading

dmsimard commented Dec 20, 2023

xlr-8 commented Oct 2, 2024

dmsimard commented Oct 2, 2024

xlr-8 commented Oct 3, 2024

dmsimard commented Dec 15, 2024


		ara doesn't provide monitoring or alerting out of the box (they are out of scope) but it records a number of granular metrics about Ansible playbooks, tasks and hosts, amongst other things.

		Starting with version 1.6.2, ara provides an integration of `prometheus_client <https://github.com/prometheus/client_python>`_ that queries the ara API and then exposes these metrics for prometheus to scrape.

WIP: First iteration of a prometheus exporter for ara #483

Are you sure you want to change the base?

WIP: First iteration of a prometheus exporter for ara #483

Conversation

dmsimard commented Feb 22, 2023

softwarefactory-project-zuul bot commented Feb 22, 2023

dmsimard commented Feb 24, 2023 • edited Loading

softwarefactory-project-zuul bot commented Feb 24, 2023

dmsimard commented Jun 19, 2023

Prometheus task results in grafana

Prometheus host results in grafana

softwarefactory-project-zuul bot commented Jun 19, 2023

softwarefactory-project-zuul bot commented Jun 19, 2023

softwarefactory-project-zuul bot commented Jun 20, 2023

dmsimard commented Jun 20, 2023

softwarefactory-project-zuul bot commented Jun 20, 2023

softwarefactory-project-zuul bot commented Jun 21, 2023

dmsimard commented Jun 21, 2023

softwarefactory-project-zuul bot commented Jul 21, 2023

softwarefactory-project-zuul bot commented Jul 21, 2023

dmsimard commented Oct 23, 2023

softwarefactory-project-zuul bot commented Oct 23, 2023

dmsimard commented Oct 26, 2023

dmsimard Oct 26, 2023

Choose a reason for hiding this comment

dmsimard Oct 29, 2023

Choose a reason for hiding this comment

voileux commented Nov 17, 2023

dmsimard commented Nov 18, 2023

copolycube commented Nov 23, 2023 • edited Loading

dmsimard commented Dec 20, 2023

xlr-8 commented Oct 2, 2024

dmsimard commented Oct 2, 2024

Demo: Trying out the exporter

xlr-8 commented Oct 3, 2024

dmsimard commented Dec 15, 2024

dmsimard commented Feb 24, 2023 •

edited

Loading

copolycube commented Nov 23, 2023 •

edited

Loading