-
Notifications
You must be signed in to change notification settings - Fork 74
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #397 from pirat013/infra-monitoring
Update SAP infrastructure monitoring paper
- Loading branch information
Showing
38 changed files
with
1,570 additions
and
529 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
// Alertmanager adoc file | ||
// Please use the following line to implement each tagged content to the main document: | ||
// include::SLES4SAP-sap-infra-monitoring-alertmanager.adoc[tag=alert-XXXXX] | ||
|
||
// Alertmanager general | ||
# tag::alert-general[] | ||
===== Alertmanager | ||
|
||
The https://prometheus.io/docs/alerting/latest/alertmanager/[Alertmanager] handles alerts sent by client applications such as the Prometheus or Loki server. | ||
It takes care of deduplicating, grouping, and routing them to the correct receiver integration such as email or PagerDuty. It also takes care of | ||
silencing and inhibition of alerts. | ||
# end::alert-general[] | ||
|
||
|
||
// Alertmanager Implementing | ||
# tag::alert-impl[] | ||
=== Alertmanager | ||
The Alertmanager package can be found in the PackageHub repository. | ||
The repository needs to be activated via the SUSEConnect command first, unless you have activated it in the previous steps already. | ||
|
||
|
||
[source] | ||
---- | ||
SUSEConnect --product PackageHub/15.3/x86_64 | ||
---- | ||
|
||
Alertmanager can then be installed via the `zypper` command: | ||
[subs="attributes,specialchars,verbatim,quotes"] | ||
---- | ||
zypper in golang-github-prometheus-alertmanager | ||
---- | ||
|
||
|
||
Notification can be done to different receivers. A receivers can be simply be an email, chat systems, webhooks and more. | ||
(for a complete list please take a look at the https://prometheus.io/docs/alerting/latest/configuration/#receiver[Alertmanager documentation]) + | ||
|
||
|
||
The example configuration below is using email for notification (receiver). + | ||
|
||
|
||
Edit the Alertmanager configuration file `/etc/alertmanager/config.yml` like below: + | ||
|
||
[subs="attributes,specialchars,verbatim,quotes"] | ||
---- | ||
global: | ||
resolve_timeout: 5m | ||
smtp_smarthost: '<mailserver>' | ||
smtp_from: '<mail-address>' | ||
smtp_auth_username: '<username>' | ||
smtp_auth_password: '<passwd>' | ||
smtp_require_tls: true | ||
route: | ||
group_by: ['...'] | ||
group_wait: 10s | ||
group_interval: 5m | ||
repeat_interval: 4h | ||
receiver: 'email' | ||
receivers: | ||
- name: 'email' | ||
email_configs: | ||
- send_resolved: true | ||
to: '<target mail-address>' | ||
from: 'mail-address>' | ||
headers: | ||
From: <mail-address> | ||
Subject: '{{ template "email.default.subject" . }}' | ||
html: '{{ template "email.default.html" . }}' | ||
---- | ||
|
||
|
||
[subs="attributes,specialchars,verbatim,quotes"] | ||
Start and enable the alertmanager service: | ||
---- | ||
systemctl enable --now prometheus-alertmanager.service | ||
---- | ||
|
||
# end::alert-impl[] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,106 @@ | ||
// Collectd adoc file | ||
// Please use the following line to implement each tagged content to the main document: | ||
// include::SLES4SAP-sap-infra-monitoring-collectd.adoc[tag=collectd-XXXXX] | ||
|
||
// Collectd general | ||
# tag::collectd-general[] | ||
|
||
===== `collectd` - System information collection daemon | ||
https://collectd.org/[`collectd`] is a small daemon which collects system information periodically and provides mechanisms to store and monitor the values in a variety of ways. | ||
|
||
# end::collectd-general[] | ||
|
||
|
||
// Collectd implementing | ||
# tag::collectd-impl[] | ||
|
||
=== `collectd` | ||
|
||
The `collectd` packages can be installed from the SUSE repositories as well. For the example at hand, we have used a newer version from the openSUSE repository. | ||
|
||
Create a file `/etc/zypp/repos.d/server_monitoring.repo` and add the following content to it: | ||
[subs="attributes,specialchars,verbatim,quotes"] | ||
.Content for /etc/zypp/repos.d/server_monitoring.repo | ||
---- | ||
[server_monitoring] | ||
name=Server Monitoring Software (SLE_15_SP3) | ||
type=rpm-md | ||
baseurl=https://download.opensuse.org/repositories/server:/monitoring/SLE_15_SP3/ | ||
gpgcheck=1 | ||
gpgkey=https://download.opensuse.org/repositories/server:/monitoring/SLE_15_SP3/repodata/repomd.xml.key | ||
enabled=1 | ||
---- | ||
|
||
Afterward refresh the repository metadata and install `collectd` and its plugins. | ||
|
||
[subs="attributes,specialchars,verbatim,quotes"] | ||
---- | ||
# zypper ref | ||
# zypper in collectd collectd-plugins-all | ||
---- | ||
|
||
Now the `collectd` must be adapted to collect the information you want to get and export it in the format you need. | ||
For example, when looking for network latency, use the ping plugin and expose the data in a Prometheus format. | ||
|
||
[subs="attributes,specialchars,verbatim,quotes"] | ||
.Configuration of collectd in /etc/collectd.conf (excerpts) | ||
---- | ||
... | ||
LoadPlugin ping | ||
... | ||
<Plugin ping> | ||
Host "10.162.63.254" | ||
Interval 1.0 | ||
Timeout 0.9 | ||
TTL 255 | ||
# SourceAddress "1.2.3.4" | ||
# AddressFamily "any" | ||
Device "eth0" | ||
MaxMissed -1 | ||
</Plugin> | ||
... | ||
LoadPlugin write_prometheus | ||
... | ||
<Plugin write_prometheus> | ||
Port "9103" | ||
</Plugin> | ||
... | ||
---- | ||
|
||
Uncomment the `LoadPlugin` line and check the `<Plugin ping>` section in the file. | ||
|
||
Modify the `systemd` unit that `collectd` works as expected. First, create a copy from the system-provided service file. | ||
[subs="attributes,specialchars,verbatim,quotes"] | ||
---- | ||
# cp /usr/lib/systemd/system/collectd.service /etc/systemd/system/collectd.service | ||
---- | ||
|
||
Second, adapt this local copy. | ||
Add the required `CapabilityBoundingSet` parameters in our local copy `/etc/systemd/system/collectd.service`. | ||
[subs="attributes,specialchars,verbatim,quotes"] | ||
---- | ||
... | ||
# Here's a (incomplete) list of the plugins known capability requirements: | ||
# ping CAP_NET_RAW | ||
CapabilityBoundingSet=CAP_NET_RAW | ||
... | ||
---- | ||
|
||
Activate the changes and start the `collectd` function. | ||
[subs="attributes,specialchars,verbatim,quotes"] | ||
---- | ||
# systemctl daemon-reload | ||
# systemctl enable --now collectd | ||
---- | ||
|
||
All `collectd` metrics are accessible at port 9103. | ||
|
||
With a quick test, you can see if the metrics can be scraped. | ||
[subs="attributes,specialchars,verbatim,quotes"] | ||
---- | ||
# curl localhost:9103/metrics | ||
---- | ||
// The offical project on github: https://github.com/collectd/collectd/ | ||
|
||
|
||
# end::collectd-impl[] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
// Grafana adoc file | ||
// Please use the following line to implement each tagged content to the main document: | ||
// include::SLES4SAP-sap-infra-monitoring-grafana.adoc[tag=grafana-XXXXX] | ||
|
||
// Grafana general | ||
# tag::grafana-general[] | ||
|
||
===== Grafana | ||
|
||
https://grafana.com/oss/grafana/[Grafana] is an open source visualization and analytics platform. | ||
Grafana's plug-in architecture allows interaction with a variety of data sources without creating data copies. | ||
Its graphical browser-based user interface visualizes the data through highly customizable views, providing an interactive diagnostic workspace. | ||
|
||
Grafana can display metrics data from Prometheus and log data from Loki side-by-side, correlating events from log files with metrics. | ||
This can provide helpful insights when trying to identify the cause for an issue. | ||
Also, Grafana can trigger alerts based on metrics or log entries, and thus help identify potential issues early. | ||
|
||
# end::grafana-general[] | ||
|
||
|
||
// Grafana implementing | ||
# tag::grafana-impl[] | ||
|
||
=== Grafana | ||
|
||
The Grafana RPM packages can be found in the PackageHub repository. | ||
The repository has to be activated via the `SUSEConnect` command first, unless you have activated it in the previous steps already. | ||
---- | ||
# SUSEConnect --product PackageHub/15.3/x86_64 | ||
---- | ||
|
||
Grafana can then be installed via `zypper` command: | ||
---- | ||
# zypper in grafana | ||
---- | ||
|
||
|
||
Start and enable the Grafana server service: | ||
---- | ||
# systemctl enable --now grafana-server.service | ||
---- | ||
|
||
|
||
Now connect from a browser to your Grafana instance and log in: | ||
|
||
image::sap-infra-monitoring-grafana-login.png[Grafana Login page,scaledwidth=80%,title="Grafana welcome page"] | ||
|
||
==== Grafana data sources | ||
After the login, the data source must be added. On the right hand there is a wheel where a new data source can be added. | ||
|
||
image::sap-infra-monitoring-grafana-datasource-add.png[Grafana add a new data source,scaledwidth=80%,title="Adding a new Grafana data source"] | ||
|
||
Add a data source for the Prometheus service. | ||
|
||
.Prometheus example | ||
image::sap-infra-monitoring-grafana-data-prometheus.png[Prometheus data source,scaledwidth=80%,title="Grafana data source for Prometheus DB"] | ||
|
||
Also add a data source for Loki. | ||
|
||
.Loki example | ||
image::sap-infra-monitoring-grafana-data-loki.png[Loki data source,scaledwidth=80%,title="Grafana data source for LOKI DB"] | ||
|
||
Now Grafana can access both the metrics stored in Prometheus and the log data collected by Loki, to visualize them. | ||
|
||
==== Grafana dashboards | ||
|
||
Dashboards are how Grafana presents information to the user. | ||
Prepared dashboards can be downloaded from https://grafana.com/dashboards, or imported using the Grafana ID. | ||
|
||
.Grafana dashboard import | ||
image::sap-infra-monitoring-grafana-dashboards.png[Dashboard overview,scaledwidth=80%,title="Grafana dashboard import option"] | ||
|
||
The dashboards can also be created from scratch. Information from all data sources can be merged into one dashboard. | ||
|
||
image::sap-infra-monitoring-grafana-dashboard-new.png[Dashboard create a new dashboard,scaledwidth=80%,title="Build your own dashboard"] | ||
|
||
==== Putting it all together | ||
The picture below shows a dashboard displaying detailed information about the SAP HANA cluster, orchestrated by *pacemaker*. | ||
|
||
.Dashboard example for SAP HANA | ||
image::sap-infra-monitoring-grafana-hana-cluster.png[SUSE HANA cluster dashboard example,scaledwidth=80%,title="SUSE cluster exporter dashboard"] | ||
|
||
|
||
# end::grafana-impl[] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,105 @@ | ||
// IPMI adoc file | ||
// Please use the following line to implement each tagged content to the main document: | ||
// include::SLES4SAP-sap-infra-monitoring-ipmi.adoc[tag=ipmi-XXXXX] | ||
|
||
// IPMI general | ||
# tag::ipmi-general[] | ||
|
||
===== Prometheus IPMI Exporter | ||
The https://github.com/prometheus-community/ipmi_exporter[Prometheus IPMI Exporter] supports both | ||
|
||
* the regular /metrics endpoint for Prometheus, exposing metrics from the host that the exporter is running on, | ||
* and an /ipmi endpoint that supports IPMI over RMCP. | ||
|
||
One exporter instance running on one host can be used to monitor a large number of IPMI interfaces by passing the target parameter to a scrape. | ||
|
||
# end::ipmi-general[] | ||
|
||
|
||
// IPMI implementing | ||
# tag::ipmi-impl[] | ||
|
||
|
||
=== Prometheus IPMI Exporter | ||
|
||
The IPMI exporter can be used to scrape information like temperature, power supply information and fan information. | ||
|
||
Create a directory, download and extract the IPMI exporter. | ||
[subs="attributes,specialchars,verbatim,quotes"] | ||
---- | ||
# mkdir ipmi_exporter | ||
# cd ipmi_exporter | ||
# curl -OL https://github.com/prometheus-community/ipmi_exporter/releases/download/v1.4.0/ipmi_exporter-1.4.0.linux-amd64.tar.gz | ||
# tar xzvf ipmi_exporter-1.4.0.linux-amd64.tar.gz | ||
---- | ||
|
||
NOTE: We have been using the version 1.4.0 of the IPMI exporter. For a different release, the URL used in the `curl` command above needs to be adapted. | ||
Current releases can be found at the https://github.com/prometheus-community/ipmi_exporter[IPMI exporter GitHub repository]. | ||
|
||
|
||
Some additional packages are required and need to be installed. | ||
[subs="attributes,specialchars,verbatim,quotes"] | ||
---- | ||
# zypper in freeipmi libipmimonitoring6 monitoring-plugins-ipmi-sensor1 | ||
---- | ||
|
||
To start the IPMI exporter on the observed host, first start a new `screen` session, and then start the exporter.footnote:[Starting the IPMI exporter should really be done by creating a systemd unit.] | ||
// TODO: replace use of screen by a systemd unit for the IPMI exporter | ||
[subs="attributes,specialchars,verbatim,quotes"] | ||
.Starting IPMI | ||
---- | ||
screen -S ipmi | ||
# cd ipmi_exporter-1.4.0.linux-amd64 | ||
# ./ipmi_exporter | ||
---- | ||
The IPMI exporter binary `ipmi_exporter` has been started in a screen session which can be detached (type `Ctrl+a d`). | ||
This lets the exporter continue running in the background. | ||
|
||
==== IPMI Exporter Systemd Service File | ||
|
||
A more convenient and secure way to start the IPMI exporter is using a systemd service. | ||
To do so a service unit file has to be created under /etc/systemd/system/: | ||
|
||
[subs="attributes,specialchars,verbatim,quotes"] | ||
.Copy IPMI binary | ||
---- | ||
cp ipmi_exporter-1.4.0.linux-amd64 /usr/local/bin/ | ||
---- | ||
|
||
[source] | ||
---- | ||
# cat /etc/systemd/system/ipmi-exporter.service | ||
[Unit] | ||
Description=IPMI exporter | ||
Documentation= | ||
[Service] | ||
Type=simple | ||
Restart=no | ||
ExecStart=/usr/local/bin/ipmi_exporter-1.4.0.linux-amd64 | ||
[Install] | ||
WantedBy=multi-user.target | ||
---- | ||
|
||
The "systemd" needs to be informed about the new unit: | ||
|
||
.reload the systemd daemon | ||
[source] | ||
---- | ||
# systemctl daemon-reload | ||
---- | ||
|
||
And finally enabled and started: | ||
|
||
.Start ipmi exporter | ||
[source] | ||
---- | ||
# systemctl enable --now ipmi-exporter.service | ||
---- | ||
|
||
|
||
The metrics of the ipmi_exporter are accessible port 9290. | ||
|
||
//accessing the remote configured ipmi metrics: http://ls3331:9290/ipmi?target=ls3316r&module=remote | ||
|
||
|
||
# end::ipmi-impl[] |
Oops, something went wrong.