At times one needs to change multiple alerts or various queries in several dashboards at once. A change in a metric name can cause something like this. For example the metric generator in kubernetes, kube-state-metrics, occasionally changes a metric's name.
After such a change, the platform owner needs to update all alerts and dashboards with that metric. Searching for that in the Wavefront GUI and updating the queries manually can be very labor intensive, error prone and time consuming. wavectl
can help with automating such a global change.
For the sake of an example, let's say because of an upsteam change, all metrics that started with proc.
have been renamed to start with host.proc.
. Once this upstream change gets deployed, numerous alerts and dashboards will be broken. They will try to display the old metric name and will not show data. In order to quickly fix this problem via wavectl
we first pull
all alerts and resources that match the proc\.
regular expression. The --match
option can be used to narrow down the returned set via a regular expression search.
$ wavectl pull /tmp/RepetitiveEditing/alerts alert --match "proc\."
$ wavectl pull /tmp/RepetitiveEditing/dashboards dashboard --match "proc\."
See the pulled alerts, dashboards.
$ find /tmp/RepetitiveEditing -type f
/tmp/RepetitiveEditing/alerts/1530723441304.alert
/tmp/RepetitiveEditing/alerts/1530723441442.alert
/tmp/RepetitiveEditing/alerts/1530723441589.alert
/tmp/RepetitiveEditing/alerts/1530723443146.alert
/tmp/RepetitiveEditing/dashboards/metadata-perfpod.dashboard
/tmp/RepetitiveEditing/dashboards/octoproxy.dashboard
See the usage of the metrics starting with proc.
in pulled alerts, dashboards.
$ find /tmp/RepetitiveEditing -type f | xargs grep "proc."
/tmp/RepetitiveEditing/alerts/1530723441304.alert: "condition": "ts(proc.net.percent,server_type=\"compute-*\" and env=\"live\") > 80",
/tmp/RepetitiveEditing/alerts/1530723441304.alert: "displayExpression": "ts(proc.net.percent,server_type=\"compute-*\" and env=\"live\")",
/tmp/RepetitiveEditing/alerts/1530723441442.alert: "condition": "ts(proc.stat.cpu.percentage_used,server_type=\"compute-*\" and env=\"live\") > 80",
/tmp/RepetitiveEditing/alerts/1530723441442.alert: "displayExpression": "ts(proc.stat.cpu.percentage_used,server_type=\"compute-*\" and env=\"dev\")",
/tmp/RepetitiveEditing/alerts/1530723441589.alert: "condition": "ts(proc.meminfo.percentage_swapused,server_type=\"compute-*\" and env=\"live\") > 10",
/tmp/RepetitiveEditing/alerts/1530723441589.alert: "displayExpression": "ts(proc.meminfo.percentage_swapused,server_type=\"compute-*\" and env=\"live\")",
/tmp/RepetitiveEditing/alerts/1530723443146.alert: "condition": "max(((sum(rate(ts(proc.stat.cpu, namespace=\"collections-service-dev\" and type=used)), pod_name) /100) / (sum(taggify(ts(kube.metrics.pod_container_resource_requests_cpu_cores, namespace=\"collections-service-dev\"), tagk, pod, pod_name, \"\", \"\"), pod_name)) * 100)) > 70",
/tmp/RepetitiveEditing/alerts/1530723443146.alert: "displayExpression": "(sum(rate(ts(proc.stat.cpu, namespace=\"collections-service-dev\" and type=used)), pod_name) /100) / (sum(taggify(ts(kube.metrics.pod_container_resource_requests_cpu_cores, namespace=\"collections-service-dev\" ), tagk, pod, pod_name, \"\", \"\"), pod_name)) * 100",
/tmp/RepetitiveEditing/dashboards/metadata-perfpod.dashboard: "query": "ts(\"proc.kernel.entropy_avail\", host=${metadata_server})",
/tmp/RepetitiveEditing/dashboards/metadata-perfpod.dashboard: "query": "ts(proc.stat.cpu.percentage_used, ${PerfPod} and host=${metadata_server})",
/tmp/RepetitiveEditing/dashboards/metadata-perfpod.dashboard: "query": "ts(proc.stat.cpu.percentage_used, ${PerfPod} and host=${metadata_db02})",
/tmp/RepetitiveEditing/dashboards/metadata-perfpod.dashboard: "query": "ts(proc.stat.cpu.percentage_used, ${PerfPod} and host=${metadata_database})",
/tmp/RepetitiveEditing/dashboards/metadata-perfpod.dashboard: "query": "ts(proc.stat.cpu.percentage_used, ${PerfPod} and host=${eng01})",
/tmp/RepetitiveEditing/dashboards/metadata-perfpod.dashboard: "query": "ts(proc.stat.cpu.percentage_used, ${PerfPod} and host=${content01})",
/tmp/RepetitiveEditing/dashboards/metadata-perfpod.dashboard: "query": "ts(proc.stat.cpu.percentage_iowait, ${PerfPod} and host=${metadata_server})",
...
Then using sed
replace all occurances of proc.
with host.proc.
$ find /tmp/RepetitiveEditing -type f | xargs sed -i -e 's/proc\./host.proc./g'
Check the changes you have make
$ find /tmp/RepetitiveEditing -type f | xargs grep "host.proc."
/tmp/RepetitiveEditing/alerts/1530723441304.alert: "condition": "ts(host.proc.net.percent,server_type=\"compute-*\" and env=\"live\") > 80",
/tmp/RepetitiveEditing/alerts/1530723441304.alert: "displayExpression": "ts(host.proc.net.percent,server_type=\"compute-*\" and env=\"live\")",
/tmp/RepetitiveEditing/alerts/1530723441442.alert: "condition": "ts(host.proc.stat.cpu.percentage_used,server_type=\"compute-*\" and env=\"live\") > 80",
/tmp/RepetitiveEditing/alerts/1530723441442.alert: "displayExpression": "ts(host.proc.stat.cpu.percentage_used,server_type=\"compute-*\" and env=\"dev\")",
/tmp/RepetitiveEditing/alerts/1530723441589.alert: "condition": "ts(host.proc.meminfo.percentage_swapused,server_type=\"compute-*\" and env=\"live\") > 10",
/tmp/RepetitiveEditing/alerts/1530723441589.alert: "displayExpression": "ts(host.proc.meminfo.percentage_swapused,server_type=\"compute-*\" and env=\"live\")",
/tmp/RepetitiveEditing/alerts/1530723443146.alert: "condition": "max(((sum(rate(ts(host.proc.stat.cpu, namespace=\"collections-service-dev\" and type=used)), pod_name) /100) / (sum(taggify(ts(kube.metrics.pod_container_resource_requests_cpu_cores, namespace=\"collections-service-dev\"), tagk, pod, pod_name, \"\", \"\"), pod_name)) * 100)) > 70",
/tmp/RepetitiveEditing/alerts/1530723443146.alert: "displayExpression": "(sum(rate(ts(host.proc.stat.cpu, namespace=\"collections-service-dev\" and type=used)), pod_name) /100) / (sum(taggify(ts(kube.metrics.pod_container_resource_requests_cpu_cores, namespace=\"collections-service-dev\" ), tagk, pod, pod_name, \"\", \"\"), pod_name)) * 100",
/tmp/RepetitiveEditing/dashboards/metadata-perfpod.dashboard: "query": "ts(\"host.proc.kernel.entropy_avail\", host=${metadata_server})",
/tmp/RepetitiveEditing/dashboards/metadata-perfpod.dashboard: "query": "ts(host.proc.stat.cpu.percentage_used, ${PerfPod} and host=${metadata_server})",
/tmp/RepetitiveEditing/dashboards/metadata-perfpod.dashboard: "query": "ts(host.proc.stat.cpu.percentage_used, ${PerfPod} and host=${metadata_db02})",
/tmp/RepetitiveEditing/dashboards/metadata-perfpod.dashboard: "query": "ts(host.proc.stat.cpu.percentage_used, ${PerfPod} and host=${metadata_database})",
/tmp/RepetitiveEditing/dashboards/metadata-perfpod.dashboard: "query": "ts(host.proc.stat.cpu.percentage_used, ${PerfPod} and host=${eng01})",
/tmp/RepetitiveEditing/dashboards/metadata-perfpod.dashboard: "query": "ts(host.proc.stat.cpu.percentage_used, ${PerfPod} and host=${content01})",
/tmp/RepetitiveEditing/dashboards/metadata-perfpod.dashboard: "query": "ts(host.proc.stat.cpu.percentage_iowait, ${PerfPod} and host=${metadata_server})",
...
Replace the Wavefront alerts and dashboards using wavectl push
$ wavectl push /tmp/RepetitiveEditing/alerts alert
Replaced alert(s):
ID NAME STATUS SEVERITY
1530723441304 Kubernetes - Node Network Utilization - HIGH (Prod) WARN
1530723441442 Kubernetes - Node Cpu Utilization - HIGH (Prod) WARN
1530723441589 Kubernetes - Node Memory Swap Utilization - HIGH (Prod) WARN
1530723443146 Collections Dev High CPU WARN
$ wavectl push /tmp/RepetitiveEditing/dashboards dashboard
Replaced dashboard(s):
ID NAME DESCRIPTION
metadata-perfpod Metadata PerfPod Monitors for testing Metadata in the PerfPods
octoproxy Skynet Octoproxy One look summary about the load balancer
After these steps all your alerts and dashboards in Wavefront will use the new metric names.
NOTE: Doing local modifications via
sed
like commands and writing the resulting files to Wavefront may be risky and dangerous. Some unintended changes may be written to Wavefront by mistake. If you want to execute safer local modifications, where you have a better handle on the resulting diff, take a look at the git integration to push command section.