-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize PrometheusCollector scrape efficiency. #2974
Conversation
Hi @bwplotka. Thanks for your PR. I'm waiting for a google member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
Results so far:
NOTE: This is missing some validity checks and single flight on HTTP handler for full experience. It also shows how fast/efficient we can get this. Yet I have already a bit better idea on a bit better abstraction that can be similarly efficient if not better but more readable. |
This is for 20 containers, 5k metrics. |
Very nice. There are definitely a lot of very large collectors out there that are transforming from one representation to another where series changes are a small fraction of the total set each scrape - encouraging to see the discussion towards a pattern there (apiserver in kube, haproxy/other-proxy -> source format) |
for _, cont := range containers { | ||
values := make([]string, 0, len(rawLabels)) | ||
labels := make([]string, 0, len(rawLabels)) | ||
values := values[:0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks to the fact I did talk about this https://twitter.com/bwplotka/status/1452971992324390918?t=Ufm9UR3YGf-ud8JimO-tZw&s=19 Ivan from audience found this should reuse variable not create another one! ((:
@bwplotka this is very promising to see the delta! |
metrics/prometheus.go
Outdated
// * returns cached results if until time expires. | ||
// | ||
// It implements prometheus.rawCollector. | ||
func (c *PrometheusCollector) Collect() []*dto.MetricFamily { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This causes a compilation failure against Kubernetes:
+++ [1028 13:00:49] Building go targets for linux/amd64:
./pkg/kubelet
# k8s.io/kubernetes/pkg/kubelet/server
pkg/kubelet/server/server.go:383:50: cannot use "k8s.io/kubernetes/vendor/github.com/google/cadvisor/metrics".NewPrometheusCollector(prometheusHostAdapter{...}, containerPrometheusLabelsFunc(s.host), includedMetrics, "k8s.io/kubernetes/vendor/k8s.io/utils/clock".RealClock{}) (type *"k8s.io/kubernetes/vendor/github.com/google/cadvisor/metrics".PrometheusCollector) as type prometheus.Collector in argument to r.RawMustRegister:
*"k8s.io/kubernetes/vendor/github.com/google/cadvisor/metrics".PrometheusCollector does not implement prometheus.Collector (wrong type for Collect method)
have Collect() []*io_prometheus_client.MetricFamily
want Collect(chan<- prometheus.Metric)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. This is because we need to use this collector in new promhttp.TransactionalHandler
similar how i implemented it on cadvisor. Looks like this has to be changed on kubelet too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a bunch of digging and the fix won't be quite as trivial or straightforward as it was in this PR. This is because in k8s interaction with Prometheus client libraries goes through the component-base metrics libraries, and it's not clear to me what the best way forward is. I pinged @logicalhan to get some feedback.
xref:
- https://github.com/kubernetes/kubernetes/blob/70e7876bef5815284b43bb1fe20b55453899d721/pkg/kubelet/server/server.go#L360
- https://github.com/kubernetes/kubernetes/blob/47c63a39ed6866c1dc45eb9ed6c4a78fec8474c6/staging/src/k8s.io/component-base/metrics/registry.go#L114-L116
Annoyingly, while it would be nice to stop using k8s' deprecated RawMustRegister
, it looks like that is still used in a couple of other places, so we can't just get rid of it and define something better for this use case:
https://cs.k8s.io/?q=RawMustRegister&i=nope&files=&excludeFiles=&repos=kubernetes/kubernetes
@bwplotka I wonder if we could still make some performance improvements here by using a notification-based approach for updating the cache. As far as I understand your current approach, is a session-based one that regenerates all the metrics during each session collection, but more efficiently than previously because of the existence of a cache. However, we can expect that most of the container stats will not change between 2 collections, so we could save a bit of CPU by only updating the cache whenever a new change in a container occurs. I am not knowledgeable about cadvisor codebase so I don't know if that would be feasible, but as far as I can tell, container watchers seem to exist in the codebase: Lines 267 to 268 in 0549d48
This is the current approach that kube-state-metrics is taking as it watches events on Kubernetes objects and update its metric cache only when an object is created/updated/deleted. So my suggestion would be to have two different kinds of cache collectors in client_golang. The first one session-based that you already proposed in prometheus/client_golang#929, which is very suitable for projects that don't have the possibility to watch for events. If node_exporter isn't using inotfiy watches on the sysfs/procfs, it might be a potential user of this kind of collector. The other type would be even more performant but would require the existence of a notification system and would be able to update the cache directly whenever a new event requiring metrics update is detected. This approach would have direct access to the cache and would lock it during collection, so it wouldn't have to share two different stores, as the session-based approach currently does. Let me know what are you thoughts on that, just some ideas I am throwing in. |
metrics/prometheus.go
Outdated
errorsGauge := 0 | ||
if err := c.collectVersionInfo(session); err != nil { | ||
errorsGauge = 1 | ||
klog.Warningf("Couldn't get version info: %s", err) | ||
} | ||
if err := c.collectContainersInfo(session); err != nil { | ||
errorsGauge = 1 | ||
klog.Warningf("Couldn't get containers: %s", err) | ||
} | ||
|
||
// Describe describes all the metrics ever exported by cadvisor. It | ||
// implements prometheus.PrometheusCollector. | ||
func (c *PrometheusCollector) Describe(ch chan<- *prometheus.Desc) { | ||
c.errors.Describe(ch) | ||
for _, cm := range c.containerMetrics { | ||
ch <- cm.desc([]string{}) | ||
session.MustAddMetric( | ||
"container_scrape_error", "1 if there was an error while getting container metrics, 0 otherwise", | ||
nil, nil, prometheus.GaugeValue, float64(errorsGauge), nil, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spent some time today trying to get this working properly in k8s.
I am finding that I keep running into errors... If I don't set IdType: cadvisorv2.TypeName
in the cadvisor options, I get
ehashman@fedora:~/src/k8s$ curl http://127.0.0.1:8001/api/v1/nodes/127.0.0.1/proxy/metrics/cadvisor
# HELP cadvisor_version_info A metric with a constant '1' value labeled by kernel version, OS version, docker version, cadvisor version & cadvisor revision.
# TYPE cadvisor_version_info gauge
cadvisor_version_info{cadvisorRevision="",cadvisorVersion="",dockerVersion="",kernelVersion="5.10.0-9-amd64",osVersion="Debian GNU/Linux 11 (bullseye)"} 1
# HELP container_scrape_error 1 if there was an error while getting container metrics, 0 otherwise
# TYPE container_scrape_error gauge
container_scrape_error 1
and error in the logs
W1124 16:45:01.711801 1279370 prometheus.go:1815] Couldn't get containers: invalid request type ""
If I do set it, I get
ehashman@fedora:~/src/k8s$ curl http://127.0.0.1:8001/api/v1/nodes/127.0.0.1/proxy/metrics/cadvisor
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "error trying to reach service: EOF",
"reason": "ServiceUnavailable",
"code": 503
}
Not sure what I'm doing wrong.
k8s WIP PR is kubernetes/kubernetes#106334
And the patch I'm applying on top of the PR to get to that second error case is
diff --git a/pkg/kubelet/server/server.go b/pkg/kubelet/server/server.go
index ce3fc942e49..be9cb5e2603 100644
--- a/pkg/kubelet/server/server.go
+++ b/pkg/kubelet/server/server.go
@@ -378,7 +378,16 @@ func (s *Server) InstallDefaultHandlers() {
includedMetrics[cadvisormetrics.AcceleratorUsageMetrics] = struct{}{}
}
- br.MustRegisterRaw(metrics.NewPrometheusCollector(prometheusHostAdapter{s.host}, containerPrometheusLabelsFunc(s.host), includedMetrics, clock.RealClock{}))
+ cadvisorOpts := cadvisorv2.RequestOptions{
+ IdType: cadvisorv2.TypeName,
+ Count: 1,
+ Recursive: true,
+ }
+
+ containersCollector := metrics.NewPrometheusCollector(prometheusHostAdapter{s.host}, containerPrometheusLabelsFunc(s.host), includedMetrics, clock.RealClock{})
+ containersCollector.SetOpts(cadvisorOpts)
+ br.MustRegisterRaw(containersCollector)
+
br.MustRegister(metrics.NewPrometheusMachineCollector(prometheusHostAdapter{s.host}, includedMetrics))
s.restfulCont.Handle(cadvisorMetricsPath,
promhttp.HandlerForTransactional(br, promhttp.HandlerOpts{ErrorHandling: promhttp.ContinueOnError}),
Updated for the newest client_golang changes (prometheus/client_golang#929) Should be ready for review. There are some observations for further improvements, but otherwise, all races should be fixed and we should see significant improvement in allocs and cpus. TODO still:
FYI: I have to take 2w off due to post knee surgery recovery, but anyone is welcome to continue on this! |
Ofc @dgrisonnet said, move to pure watcher based approach would be extremely beneficial, but it needs huge changes, so keeping that off for now. |
I have updated my k/k PR https://github.com/kubernetes/kubernetes/pull/107960/files#diff-c3e7f724d1b0dcee40df80716ed57d90d2649710150fa92bf9822bdad35e0429 which integrates this change + the required changes to rebase on top of runc 1.1.0. I still am not getting any metrics... any time I try to hit the kubelet endpoint, I get an empty response and a corresponding logline of:
|
Ack @ehashman taking a look today/tomorrow (: |
``` /tmp/GoLand/___BenchmarkPrometheusCollector_Collect_in_github_com_google_cadvisor_metrics.test -test.v -test.paniconexit0 -test.bench ^\QBenchmarkPrometheusCollector_Collect\E$ -test.run ^$ -test.benchtime 10s goos: linux goarch: amd64 pkg: github.com/google/cadvisor/metrics cpu: Intel(R) Core(TM) i7-9850H CPU @ 2.60GHz BenchmarkPrometheusCollector_Collect BenchmarkPrometheusCollector_Collect-12 748 14929127 ns/op 8253796 B/op 207706 allocs/op PASS Process finished with the exit code 0 ``` Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
This PR is ready to go IMO. All benchmarked. With kubelet we see significantly less allocations and CPU usage (kubernetes/kubernetes#108194) |
Not sure what pull-cadvisor-e2e job fail means: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/google_cadvisor/2974/pull-cadvisor-e2e/1496837320517619712/ |
TODO: Clean up the commits. |
metrics/cache/cache_test.go
Outdated
dto "github.com/prometheus/client_model/go" | ||
) | ||
|
||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uncomment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in a587512.
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
@bwplotka hi, is this pr still in progress? We have encountered the probelm either. |
Wow, sorry for lag. This work was paused and got stale. I changed jobs since then, so closing for now. Efficiency improvements are still very possible here, but ideally we redesign cadvisor metric tracking and that's a bigger work. Something to discuss with cadvisor maintainers cc @cwangVT @SergeyKanzhelev |
Fixes kubernetes/kubernetes#104459
This might end up as some reusable framework for CRI and others, but for now attempting to optimize cadvisor in a bespoke fashion.
CPU profile: https://share.polarsignals.com/e46845c/
Memory profile: https://share.polarsignals.com/9cb2d48/
Related to prometheus/client_golang#917
Signed-off-by: Bartlomiej Plotka bwplotka@gmail.com