Monitoring/metrics/instrumentation #230

dmacvicar · 2018-03-23T14:03:23Z

Right now there are two use-cases where we need some kind of monitoring and metric tracking:

That the application is up at all
That if we have a problem: eg. the current memory consumption, OBS being unresponsive, etc., we have the right data and evidence to make conclusions about it.
Some statistics that before were thrown into the database (download counter, etc).

Therefore, I suggest we look into enabling the application to be scrapped by prometheus, which is a popular solution nowadays, and easily integrated then with graphana or other dashboards.

This means enabling a /metrics endpoint in the application. Initially we could use one of our internal prometheus installations.

https://prometheus.io/docs/prometheus/latest/getting_started/

For rails apps, enabling it could be as simple as using the Rack middleware:

# This file is used by Rack-based servers to start the application.
require ::File.expand_path('../config/environment', __FILE__)
require 'rack'
require 'prometheus/middleware/collector'
require 'prometheus/middleware/exporter'

use Rack::Deflater, if: ->(_, _, _, body) { body.any? && body[0].length > 512 }
use Prometheus::Middleware::Collector
use Prometheus::Middleware::Exporter
run SoftwareOO::Application

However there are some showstoppers when using puma/multi-process servers than need to be investigated, as not all client implementations store the metric correctly in these situations, and there may be alternative solutions for these cases.

The text was updated successfully, but these errors were encountered:

hennevogel · 2018-03-26T10:05:39Z

What about https://metrics.opensuse.org/ ? :-)

dmacvicar · 2018-03-26T12:56:26Z

That would be perfect. We would still need a prometheus instance to gather the metrics. We can use metrics.opensuse.org to display them.

hennevogel · 2018-03-26T13:42:50Z

We would still need a prometheus instance to gather the metrics.

If we only want Rails middle ware stats there is also influxdb-rails.

For the other data we can send things out to rabbit.opensuse.org, consume with telegraf, write to influxdb (make it possible for others to use this data from script or whatnot). Or sending things out with influxdb-ruby.

hustodemon · 2018-04-06T07:43:04Z

Short status update: I did some experiments with the Prometheus Exporter. I was able to export basic ruby metrics and visualize them with grafana. I still want to explore the influx options suggested by @hennevogel .

dmacvicar · 2018-04-10T11:10:45Z

@hennevogel Do we have an openSUSE instance of InfluxDB already? or do you mean running one in the same machine? (in that case it would not make a difference to use Prometheus).

hennevogel · 2018-04-11T13:06:45Z

@dmacvicar rabbitmq runs on rabbit.o.o. and metrics.o.o runs telegraf, influx and grafana already.

dmacvicar · 2018-04-12T07:50:07Z

@hustodemon we could ask @jberry-suse if we can use the InfluxDB in metrics.opensuse.org, or whether we can run prometheus there.
https://bitworking.org/news/2017/03/prometheus

hennevogel · 2018-04-12T09:00:24Z

I'm sure you can. We (OBS team) will also start to use it soon :-)

dmacvicar · 2018-04-12T09:05:20Z

I would really prefer to go the Prometheus way (pull), and also because of the internal knowledge we have inside of SUSE (used for SUSE Manager, Storage, Containers)

jberry-suse · 2018-04-12T13:34:56Z

Prometheus does not bother me. As far as pull, that's how influxdb is getting the data right now.

Presumably not talking about major resource usage as any increases will need to be requested. The plan is to manage via salt, but nothing ever came of previous meetings to achieve that. If folks have interest in converting the configuration that would be great. Otherwise, if you provide the necessary config I can install on the machine or potentially grant someone access, but that can get messy with too many chefs in kitchen.

jberry-suse · 2018-04-12T13:38:03Z

The tooling for pulling data and the grafana dashboard and data source definitions are providing via a package on OBS which would be ideal for software-o-o to do as well. That way all that is outside of proper versioning (somewhere) is the firewall config, grafana/influxdb config, and list of packages installed on machine.

jberry-suse · 2018-04-12T13:39:07Z

For an example of the package layout that uses grafana provisioning see https://github.com/openSUSE/openSUSE-release-tools/blob/820d1030e54f5c9bbfe9aeb69ca5b3b44a838aaa/dist/package/openSUSE-release-tools.spec#L445-L461.

hustodemon · 2018-05-16T07:04:30Z

Hi @jberry-suse , I wrote a simple salt state that installs prometheus on a machine and makes sure it's running. I don't have much experience with packaging, but IIUC you'd prefer creating some kind of pseudopackage which contains some prometheus config and which makes sure the prometheus is installed (via Requires). Is that right?

jberry-suse · 2018-05-16T19:30:28Z

Salt is fine. The packaging is for configs/scripts coming out of this repo, but if very minor like point at your endpoint could also be done via salt.

hustodemon · 2018-05-31T09:17:16Z

status update: I pinged the openSUSE Heroes about creating a new machine for us, let's see how this turns out.

jberry-suse · 2018-05-31T15:05:41Z

Different from metrics.o.o?

jberry-suse · 2018-05-31T15:06:54Z

Based on emails I thought were going to add to openSUSE salt master and have it thus installed on metrics.o.o.

jberry-suse · 2018-05-31T15:15:24Z

If you need ssh access to debug and get things working I can provide if you let me know what pubkey to use.

hustodemon · 2018-06-01T07:16:19Z

I see. We don't really care where Prometheus is going to be installed, metrics.o.o would be also fine. I'll update my ticket, then.

dmacvicar assigned dmacvicar and hustodemon Mar 23, 2018

hustodemon mentioned this issue Jul 2, 2018

Enable Prometheus exporter #355

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monitoring/metrics/instrumentation #230

Monitoring/metrics/instrumentation #230

dmacvicar commented Mar 23, 2018

hennevogel commented Mar 26, 2018 •

edited

Loading

dmacvicar commented Mar 26, 2018

hennevogel commented Mar 26, 2018 •

edited

Loading

hustodemon commented Apr 6, 2018

dmacvicar commented Apr 10, 2018

hennevogel commented Apr 11, 2018

dmacvicar commented Apr 12, 2018

hennevogel commented Apr 12, 2018

dmacvicar commented Apr 12, 2018

jberry-suse commented Apr 12, 2018 •

edited

Loading

jberry-suse commented Apr 12, 2018

jberry-suse commented Apr 12, 2018

hustodemon commented May 16, 2018

jberry-suse commented May 16, 2018

hustodemon commented May 31, 2018

jberry-suse commented May 31, 2018

jberry-suse commented May 31, 2018

jberry-suse commented May 31, 2018

hustodemon commented Jun 1, 2018

Monitoring/metrics/instrumentation #230

Monitoring/metrics/instrumentation #230

Comments

dmacvicar commented Mar 23, 2018

hennevogel commented Mar 26, 2018 • edited Loading

dmacvicar commented Mar 26, 2018

hennevogel commented Mar 26, 2018 • edited Loading

hustodemon commented Apr 6, 2018

dmacvicar commented Apr 10, 2018

hennevogel commented Apr 11, 2018

dmacvicar commented Apr 12, 2018

hennevogel commented Apr 12, 2018

dmacvicar commented Apr 12, 2018

jberry-suse commented Apr 12, 2018 • edited Loading

jberry-suse commented Apr 12, 2018

jberry-suse commented Apr 12, 2018

hustodemon commented May 16, 2018

jberry-suse commented May 16, 2018

hustodemon commented May 31, 2018

jberry-suse commented May 31, 2018

jberry-suse commented May 31, 2018

jberry-suse commented May 31, 2018

hustodemon commented Jun 1, 2018

hennevogel commented Mar 26, 2018 •

edited

Loading

hennevogel commented Mar 26, 2018 •

edited

Loading

jberry-suse commented Apr 12, 2018 •

edited

Loading