Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

Improve and extend Prometheus metrics #2557

Open
brb opened this issue Oct 21, 2016 · 4 comments
Open

Improve and extend Prometheus metrics #2557

brb opened this issue Oct 21, 2016 · 4 comments
Labels
Milestone

Comments

@brb
Copy link
Contributor

brb commented Oct 21, 2016

#2547 has introduced Prometheus metrics for weave-net. Some metrics can be improved though. Suggestions include in no particular order:

  1. weave_{packet,bytes}_total: remove the "flow" label.
  2. weave_{packet,bytes}_total: add the "overlay" label.
  3. weave_{packet,bytes}_total{overlay="fastdp"}: restore and change calculation to flushedFlowsTotal + sum(activeFlows) as flows get flushed periodically . flushedFlowsTotal should be updated before flushing flows.
  4. Include residual metrics Expose metrics #2535 (comment) as appropriate
  5. Write tests for metrics (?)
  6. Distribute the metrics code (prog/weaver/metrics.go) across parts in the code which metrics are instrumenting (same as logging). Keep only those parts from prog/weave/metrics.go which are necessary for actual serving of metrics to Prometheus. The change helps with debugging. Keep in mind, that it changes the internal metrics collection model from pull to push.
  7. Initialize labeled metrics if all label values are known in advance. That prevents from subtle bugs, e.g. dividing two metrics when one value is missing.
  8. When exposing a metric with bunch of labels, create a static counter with all labels and collect the metric with only a reference label. When querying, you can join two metrics in order to access the labels.
  9. Split up metrics containing a "total" label into two separate metrics, as it simplifies querying and prevents from matching mistakes. E.g. weave_dns_entries{state="local"} / ignoring(state) weave_dns_entries{state="total"} vs weave_local_dns_entries / weave_dns_entries.
@brb brb added the feature label Oct 21, 2016
@brb brb added this to the overflow milestone Oct 21, 2016
@awh awh mentioned this issue Oct 24, 2016
@juliusv
Copy link

juliusv commented Nov 1, 2016

Additionally, weave_connection_termination_count should be weave_connection_terminations_total.

@brb
Copy link
Contributor Author

brb commented Nov 1, 2016

Additionally, weave_connection_termination_count should be weave_connection_terminations_total.

This is addressed by #2568

@frittentheke
Copy link

@brb @bboreham I am unsure whether I am supposed to open a new issue. This one is quite old and seems to be "resolved" mostly?

Anyways: At KubeCon 2018 in Kopenhagen we spoke briefly about the automatic cleanup (rmpeer) of peers when running inside Kubernetes on nodes that are part of an AWS ASG. It would be very nice if you could add a metric / count of the number of peers which were removed due to this mechanism to actively monitor this. @bboreham you suggested to simply raise an issue to request this feature.

@bboreham
Copy link
Contributor

@frittentheke generally it's best to open a new issue; it's easier to deal with accidental duplicates than the other way round.

Since this particular issue is a random set of "stuff" I think that's an even better reason to open a new one.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants