Skip to content

vanus-labs/observability

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Observability

The Observability infra for Vanus.

The Vanus version is required > v0.6

How to use

  1. Install
kubectl apply -f https://raw.githubusercontent.com/linkall-labs/observability/main/deploy/all-in-one.yaml
  1. Ensure network accessible
kubectl port-forward service/grafana 3000:3000 -n vanus

Open localhost:3000 in your browser(default user and password is admin)

dashboard

How to add more metrics

Step by Step

Set local development environment

docker compose -f deploy/local/docker-compose.yml up

after do that, you can visit localhost:3000 for grafana and localhost:9090 for Prometheus.

Add new metrics

if you want to add new metrics to Vanus, you could refer to Vanus metrics to know how these metrics work in Vanus internal.

For instance, if we want to add send_event_request_latency_summary to Vanus, which metric record the summary of sending events reqeust latency in Gateway component.

Firstly, create prometheus.NewSummaryVec,

GatewayEventWriteLatencySummaryVec = prometheus.NewSummaryVec(prometheus.SummaryOpts{
		Namespace: namespace,
		Subsystem: moduleOfGateway,
		Name:      "send_event_request_latency_summary",
		Objectives: map[float64]float64{
			0.25:   0.1,
			0.50:   0.1,
			0.75:   0.1,
			0.80:   0.05,
			0.85:   0.05,
			0.9:    0.05,
			0.95:   0.01,
			0.96:   0.01,
			0.97:   0.01,
			0.98:   0.01,
			0.99:   0.001,
			0.999:  0.0001,
			0.9999: 0.00001},
	}, []string{LabelEventbus, LabelProtocol, LabelBatchSize})

you could find this code snippet in metrics/gateway.go. NOTE, you could view Label Consideration to know how to configure labels

Secondly, add this metric to GetGatewayMetrics(), which ensure this metric can be registered when Gateway started.

func GetGatewayMetrics() []prometheus.Collector {
	coll := []prometheus.Collector{
		GatewayEventReceivedCountVec,
		GatewayEventWriteLatencySummaryVec,
	}
	return append(coll, getGoRuntimeMetrics()...)
}

you could find this function in metrics/metrics.go,

Plus, if you added metrics for other components, notice to add this metrics to relative functions.

Thirdly, record value in right place, if you use Jetbrains Goland, you can find calling by Right Click -> Go to -> Declaration or Usages or command+B

usages

function-call.png

Other IDEAs also provide the same features.

Validate metrics in Prometheus

after you finished adding metrics to Vanus, restart component your metrics added to, open localhost:9090 to ensure if Prometheus received this metrics. validate-metrics

the prefix vanus_gateway_ is automatically added, which depends on fields of Namespace and Subsystem when declare metrics.

Configure visual panel in Grafana

Next, we need to create a visual panel to show metrics in Grafana. open localhost:3000, and select dashboard Vanus -> Cluster, after entre this dashboard, let us create a new panel into Overview.

create-panel

It isn't easy to edit panel, for visualizing send_event_request_latency_summary, we have done follow steps:

  1. chose Time series(default type of panel) series.png

  2. created a single query(aka PromSQL) rate(vanus_gateway_send_event_request_latency_summary_count[1m])(use Builder mode is helpful for create query statement), after query was created, we can see curves in the panel. next to set Option -> Legend, which effect metric name is displayed in panel, {{quantile}} is the one of label in metric. query.png.

you could find more about how to write query in official docs. Sometimes, it's needed to write query with variables, you can find that how to use it in Grafana Official documentation

  1. set others options. we changed Tooltip, Legend and Axis for better visual experience.
  • Toolip: before changed, default values are Tooltip mode=Single) and Values sort order=None.
  • Legend: before changed, default values are Mode=List, Placement=Bottom and Values=None.
  • Axis: before changed, default values are Scale=Linear, and set Log base=2. options.png

Result

After we finished those steps, we got a new panel in Cluster dashboard. latency-panel.png

Save configuration

Share -> Export -> View JSON -> Copy to clipboard save.png

and replace the cluster.json with the JSON you're copied.

Submit to upstream

  1. commit to GitHub
git add grafana/dashboard/cluster.json

git commit -sm 'feat: add send_event_request_latency_summary into Cluster dashboard'

git push origin <your branch>
  1. create a PR in linkall-labs/observability

Label Consideration

Coming soon.

How Vanus observability works in Kubernetes