Skip to content
This repository has been archived by the owner on Jul 11, 2023. It is now read-only.

tests/framework: additions to framework to query from Prometheus/Grafana #2138

Merged
merged 5 commits into from
Dec 4, 2020

Conversation

eduser25
Copy link
Contributor

@eduser25 eduser25 commented Dec 3, 2020

This commit adds few high level helpers to allow querying for various resources
on both Prometheus and Grafana. Intention is to use them in our test infrastructure
to gather a holistic look on data and metrics on processes, namely control plane ones
(but not limited to).

The current set of helpers added are sufficient query and track
CPU and Memory trends, as well as graphing the overall control plane namespace
CPU and memory consumption through Grafana remote rendering, all at any point
during a test.

For graphing, Grafana remote rendering and specific dashboards which already
target the data to be graphed are needed.

This is part of an effort to better instrument our tests and potentially unlock minimal scale
testing using the very same tools we've written our tests with.

Relevant APIs:

common_metrics.go:
  struct Prometheus {}
  (*Prometheus) VectorQuery(string, time) (float, err)                               // raw query call, since it's none-range, expects only one sample in vector form
  (*Prometheus) GetNumEnvoysInMesh() (float, err)                                    // returns number of current envoy in mesh as seen by Prom, ie. '20'
  (*Prometheus) GetMemRSSforContainer(ns, pod, container) (float, err)               // returns RSS footprint of container in bytes, ie. '52562'
  (*Prometheus) GetCPULoadAvgforContainer(ns, pod, container, avgTime) (float, err)  // returns CPU load avg for container given a timerange, ie. '0.62'
  (*Prometheus) GetCPULoadAvgforContainer(ns, pod, container) (float, float, float, err) // returns 1m, 5m and 15m CPU Load averages for a container

  struct Grafana {}
  (*Grafana) PanelPNGSnapshot(dashboard, panelId, timeMinutes, saveFilepath)        // Triggers a dashboard/panel render and stores it in 'saveFilepath'

common_apps.go:
  GetPrometheusPodHandle(ns, pod, port) (*Prometheus, err)  // port-forwards and gets handle generically for any Prometheus pod
  GetGrafanaPodHandle(ns, pod, port) (*Grafana, err)        // port-forwards and gets handle generically for any Grafana pod
  GetOSMPrometheusaHandle() (*Prometheus, err)              // GetPrometheusPodHandle wrapper for OSM-Prometheus
  GetOSMGrafanaHandle() (*Grafana, err)                     // GetGrafanaPodHandle  wrapper for OSM-Grafana
  • Metrics [X]
  • Tests [X]
  • Performance [X]

Please answer the following questions with yes/no.

  • Does this change contain code from or inspired by another project? If so, did you notify the maintainers and provide attribution?
    No

This commit adds few open helpers to allow querying for various resources
on both Prometheus and Grafana.

Particularly, the current helpers added are sufficient to match and record
CPU and Memory trends, as well as graphing the overall control plane namespace
CPU and memory consumption through Grafana remote rendering.
@eduser25 eduser25 requested a review from a team as a code owner December 3, 2020 02:44
Signed-off-by: Eduard Serra <eduser25@gmail.com>
@eduser25 eduser25 marked this pull request as draft December 3, 2020 02:51
@eduser25 eduser25 requested a review from nojnhuh December 3, 2020 02:51
@codecov-io
Copy link

codecov-io commented Dec 3, 2020

Codecov Report

Merging #2138 (2683f60) into main (d12f43b) will decrease coverage by 0.81%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2138      +/-   ##
==========================================
- Coverage   58.67%   57.86%   -0.82%     
==========================================
  Files         144      144              
  Lines        5936     5995      +59     
==========================================
- Hits         3483     3469      -14     
- Misses       2450     2523      +73     
  Partials        3        3              
Impacted Files Coverage Δ
cmd/cli/dashboard.go 0.00% <0.00%> (ø)
cmd/cli/proxy_configdump.go 4.54% <0.00%> (ø)
pkg/kubernetes/portforward.go 0.00% <ø> (ø)
pkg/endpoint/providers/kube/fake.go 0.00% <0.00%> (-70.84%) ⬇️
pkg/catalog/service.go 91.66% <0.00%> (-8.34%) ⬇️
pkg/endpoint/providers/kube/client.go 82.85% <0.00%> (-1.83%) ⬇️
pkg/catalog/routes.go 81.86% <0.00%> (-0.68%) ⬇️
pkg/endpoint/types.go 0.00% <0.00%> (ø)
pkg/catalog/mock_catalog.go 0.00% <0.00%> (ø)
pkg/trafficpolicy/trafficpolicy.go
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fc87610...2683f60. Read the comment docs.

nojnhuh
nojnhuh previously approved these changes Dec 3, 2020
Copy link
Contributor

@nojnhuh nojnhuh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple nits, but LGTM. Will definitely use this for custom metrics tests.

tests/framework/common_apps.go Outdated Show resolved Hide resolved
tests/framework/common_apps.go Outdated Show resolved Hide resolved
@eduser25 eduser25 marked this pull request as ready for review December 3, 2020 19:58
@eduser25 eduser25 merged commit e2b5254 into openservicemesh:main Dec 4, 2020
@eduser25 eduser25 deleted the metrics branch February 8, 2021 22:41
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants