Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitoring/metrics/instrumentation #230

Open
dmacvicar opened this issue Mar 23, 2018 · 19 comments
Open

Monitoring/metrics/instrumentation #230

dmacvicar opened this issue Mar 23, 2018 · 19 comments
Assignees

Comments

@dmacvicar
Copy link
Member

Right now there are two use-cases where we need some kind of monitoring and metric tracking:

  • That the application is up at all
  • That if we have a problem: eg. the current memory consumption, OBS being unresponsive, etc., we have the right data and evidence to make conclusions about it.
  • Some statistics that before were thrown into the database (download counter, etc).

Therefore, I suggest we look into enabling the application to be scrapped by prometheus, which is a popular solution nowadays, and easily integrated then with graphana or other dashboards.

This means enabling a /metrics endpoint in the application. Initially we could use one of our internal prometheus installations.

https://prometheus.io/docs/prometheus/latest/getting_started/

For rails apps, enabling it could be as simple as using the Rack middleware:

# This file is used by Rack-based servers to start the application.
require ::File.expand_path('../config/environment', __FILE__)
require 'rack'
require 'prometheus/middleware/collector'
require 'prometheus/middleware/exporter'

use Rack::Deflater, if: ->(_, _, _, body) { body.any? && body[0].length > 512 }
use Prometheus::Middleware::Collector
use Prometheus::Middleware::Exporter
run SoftwareOO::Application

However there are some showstoppers when using puma/multi-process servers than need to be investigated, as not all client implementations store the metric correctly in these situations, and there may be alternative solutions for these cases.

@hennevogel
Copy link
Member

hennevogel commented Mar 26, 2018

What about https://metrics.opensuse.org/ ? :-)

@dmacvicar
Copy link
Member Author

That would be perfect. We would still need a prometheus instance to gather the metrics. We can use metrics.opensuse.org to display them.

@hennevogel
Copy link
Member

hennevogel commented Mar 26, 2018

We would still need a prometheus instance to gather the metrics.

If we only want Rails middle ware stats there is also influxdb-rails.

For the other data we can send things out to rabbit.opensuse.org, consume with telegraf, write to influxdb (make it possible for others to use this data from script or whatnot). Or sending things out with influxdb-ruby.

@hustodemon
Copy link
Contributor

Short status update: I did some experiments with the Prometheus Exporter. I was able to export basic ruby metrics and visualize them with grafana. I still want to explore the influx options suggested by @hennevogel .

@dmacvicar
Copy link
Member Author

@hennevogel Do we have an openSUSE instance of InfluxDB already? or do you mean running one in the same machine? (in that case it would not make a difference to use Prometheus).

@hennevogel
Copy link
Member

@dmacvicar rabbitmq runs on rabbit.o.o. and metrics.o.o runs telegraf, influx and grafana already.

@dmacvicar
Copy link
Member Author

@hustodemon we could ask @jberry-suse if we can use the InfluxDB in metrics.opensuse.org, or whether we can run prometheus there.
https://bitworking.org/news/2017/03/prometheus

@hennevogel
Copy link
Member

I'm sure you can. We (OBS team) will also start to use it soon :-)

@dmacvicar
Copy link
Member Author

I would really prefer to go the Prometheus way (pull), and also because of the internal knowledge we have inside of SUSE (used for SUSE Manager, Storage, Containers)

@jberry-suse
Copy link

jberry-suse commented Apr 12, 2018

Prometheus does not bother me. As far as pull, that's how influxdb is getting the data right now.

Presumably not talking about major resource usage as any increases will need to be requested. The plan is to manage via salt, but nothing ever came of previous meetings to achieve that. If folks have interest in converting the configuration that would be great. Otherwise, if you provide the necessary config I can install on the machine or potentially grant someone access, but that can get messy with too many chefs in kitchen.

@jberry-suse
Copy link

The tooling for pulling data and the grafana dashboard and data source definitions are providing via a package on OBS which would be ideal for software-o-o to do as well. That way all that is outside of proper versioning (somewhere) is the firewall config, grafana/influxdb config, and list of packages installed on machine.

@jberry-suse
Copy link

@hustodemon
Copy link
Contributor

Hi @jberry-suse , I wrote a simple salt state that installs prometheus on a machine and makes sure it's running. I don't have much experience with packaging, but IIUC you'd prefer creating some kind of pseudopackage which contains some prometheus config and which makes sure the prometheus is installed (via Requires). Is that right?

@jberry-suse
Copy link

Salt is fine. The packaging is for configs/scripts coming out of this repo, but if very minor like point at your endpoint could also be done via salt.

@hustodemon
Copy link
Contributor

status update: I pinged the openSUSE Heroes about creating a new machine for us, let's see how this turns out.

@jberry-suse
Copy link

Different from metrics.o.o?

@jberry-suse
Copy link

Based on emails I thought were going to add to openSUSE salt master and have it thus installed on metrics.o.o.

@jberry-suse
Copy link

If you need ssh access to debug and get things working I can provide if you let me know what pubkey to use.

@hustodemon
Copy link
Contributor

I see. We don't really care where Prometheus is going to be installed, metrics.o.o would be also fine. I'll update my ticket, then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants