Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leaking Prometheus metrics in kubernetes_logs source #12897

Closed
aikoven opened this issue May 30, 2022 · 2 comments
Closed

Leaking Prometheus metrics in kubernetes_logs source #12897

aikoven opened this issue May 30, 2022 · 2 comments
Labels
type: bug A code related bug.

Comments

@aikoven
Copy link

aikoven commented May 30, 2022

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

Some of the Prometheus metrics generated by kubernetes_logs include the file name in the labels, resulting in a new label set created for each new pod. These metrics do not seem to be cleaned up, which leads to increasing CPU usage with time.

Below is the vector component utilization (vector_utilization metric):

image

Examples of leaking metrics are vector_files_resumed_total and vector_checksum_errors_total, but there are probably more.

Configuration

No response

Version

vector 0.21.2 (x86_64-unknown-linux-gnu 1f01009 2022-05-05)

Debug Output

No response

Example Data

# HELP vector_files_resumed_total files_resumed_total
# TYPE vector_files_resumed_total counter
vector_files_resumed_total{component_id="kubernetes_logs",component_kind="source",component_name="kubernetes_logs",component_type="kubernetes_logs",file="/var/log/pods/monitoring-system_vmagent-vmagent-0-5688fd5dc8-qj9mx_3f07db30-5eb9-4671-bc01-e67477229baa/vmagent/91.log"} 1 1653891289324
vector_files_resumed_total{component_id="kubernetes_logs",component_kind="source",component_name="kubernetes_logs",component_type="kubernetes_logs",file="/var/log/pods/kube-system_node-local-dns-vq77s_6f051da5-f3f0-4e09-af11-6d852b0de2dd/node-cache/0.log"} 1 1653891289324
vector_files_resumed_total{component_id="kubernetes_logs",component_kind="source",component_name="kubernetes_logs",component_type="kubernetes_logs",file="/var/log/pods/monitoring-system_exporters-prometheus-node-exporter-zllwk_c10f7066-2a5c-4616-a62e-b97f21b6df42/node-exporter/0.log"} 1 1653891289324
vector_files_resumed_total{component_id="kubernetes_logs",component_kind="source",component_name="kubernetes_logs",component_type="kubernetes_logs",file="/var/log/pods/kube-system_cilium-x6hz9_d0608025-18ff-461e-bd21-db280a72924c/cilium-agent/0.log"} 1 1653891289324
vector_files_resumed_total{component_id="kubernetes_logs",component_kind="source",component_name="kubernetes_logs",component_type="kubernetes_logs",file="/var/log/pods/kubenurse_kubenurse-fkqlv_a84eae7e-5ffe-469c-8885-08f9a2d0c6bd/kubenurse/0.log"} 1 1653891289324
vector_files_resumed_total{component_id="kubernetes_logs",component_kind="source",component_name="kubernetes_logs",component_type="kubernetes_logs",file="/var/log/pods/consul-cluster_consul-consul-client-n68kl_5db13081-384b-4512-b032-1116d36192a3/consul/0.log.20220513-145636"} 1 1653891289324
vector_files_resumed_total{component_id="kubernetes_logs",component_kind="source",component_name="kubernetes_logs",component_type="kubernetes_logs",file="/var/log/pods/monitoring-system_vmagent-vmagent-0-5688fd5dc8-qj9mx_3f07db30-5eb9-4671-bc01-e67477229baa/config-reloader/0.log"} 1 1653891289324
vector_files_resumed_total{component_id="kubernetes_logs",component_kind="source",component_name="kubernetes_logs",component_type="kubernetes_logs",file="/var/log/pods/kube-system_cilium-x6hz9_d0608025-18ff-461e-bd21-db280a72924c/mount-cgroup/0.log"} 1 1653891289324
vector_files_resumed_total{component_id="kubernetes_logs",component_kind="source",component_name="kubernetes_logs",component_type="kubernetes_logs",file="/var/log/pods/dragonfly-system_dragonfly-dfdaemon-vqbts_8ec85ee6-5287-4a82-8e76-8f0cbd506ecf/dfdaemon/0.log"} 1 1653891289324
vector_files_resumed_total{component_id="kubernetes_logs",component_kind="source",component_name="kubernetes_logs",component_type="kubernetes_logs",file="/var/log/pods/kube-system_csi-cinder-nodeplugin-9jbxm_808218fc-9a16-4a28-9b84-f693f75906cb/liveness-probe/0.log"} 1 1653891289324
vector_files_resumed_total{component_id="kubernetes_logs",component_kind="source",component_name="kubernetes_logs",component_type="kubernetes_logs",file="/var/log/pods/kube-system_csi-cinder-nodeplugin-9jbxm_808218fc-9a16-4a28-9b84-f693f75906cb/node-driver-registrar/1.log"} 1 1653891289324
vector_files_resumed_total{component_id="kubernetes_logs",component_kind="source",component_name="kubernetes_logs",component_type="kubernetes_logs",file="/var/log/pods/monitoring-system_vmagent-vmagent-0-5688fd5dc8-qj9mx_3f07db30-5eb9-4671-bc01-e67477229baa/vmagent/92.log"} 1 1653891289324
vector_files_resumed_total{component_id="kubernetes_logs",component_kind="source",component_name="kubernetes_logs",component_type="kubernetes_logs",file="/var/log/pods/consul-cluster_consul-consul-client-n68kl_5db13081-384b-4512-b032-1116d36192a3/consul/0.log"} 1 1653891289324
vector_files_resumed_total{component_id="kubernetes_logs",component_kind="source",component_name="kubernetes_logs",component_type="kubernetes_logs",file="/var/log/pods/kube-system_csi-cinder-nodeplugin-9jbxm_808218fc-9a16-4a28-9b84-f693f75906cb/cinder-csi-plugin/0.log"} 1 1653891289324
vector_files_resumed_total{component_id="kubernetes_logs",component_kind="source",component_name="kubernetes_logs",component_type="kubernetes_logs",file="/var/log/pods/dragonfly-system_dragonfly-dfdaemon-vqbts_8ec85ee6-5287-4a82-8e76-8f0cbd506ecf/mount-netns/1.log"} 1 1653891289324
vector_files_resumed_total{component_id="kubernetes_logs",component_kind="source",component_name="kubernetes_logs",component_type="kubernetes_logs",file="/var/log/pods/kube-system_csi-cinder-nodeplugin-9jbxm_808218fc-9a16-4a28-9b84-f693f75906cb/liveness-probe/1.log"} 1 1653891289324
vector_files_resumed_total{component_id="kubernetes_logs",component_kind="source",component_name="kubernetes_logs",component_type="kubernetes_logs",file="/var/log/pods/kube-system_csi-cinder-nodeplugin-9jbxm_808218fc-9a16-4a28-9b84-f693f75906cb/cinder-csi-plugin/1.log"} 1 1653891289324
vector_files_resumed_total{component_id="kubernetes_logs",component_kind="source",component_name="kubernetes_logs",component_type="kubernetes_logs",file="/var/log/pods/consul-cluster_consul-consul-client-n68kl_5db13081-384b-4512-b032-1116d36192a3/consul/1.log"} 1 1653891289324
vector_files_resumed_total{component_id="kubernetes_logs",component_kind="source",component_name="kubernetes_logs",component_type="kubernetes_logs",file="/var/log/pods/kube-system_cilium-x6hz9_d0608025-18ff-461e-bd21-db280a72924c/cilium-agent/0.log.20220529-215401"} 1 1653891289324
vector_files_resumed_total{component_id="kubernetes_logs",component_kind="source",component_name="kubernetes_logs",component_type="kubernetes_logs",file="/var/log/pods/kube-system_csi-cinder-nodeplugin-9jbxm_808218fc-9a16-4a28-9b84-f693f75906cb/node-driver-registrar/0.log"} 1 1653891289324

Additional Context

No response

References

No response

@aikoven aikoven added the type: bug A code related bug. label May 30, 2022
@nabokihms
Copy link
Contributor

Hello, @aikoven. Thank you for the reporting!

There is a more generic issue to add expiration property to the metrics registry, so vector can determine and remove stale metrics #11995

Fix is on a way 🚀

@spencergilbert
Copy link
Contributor

Thanks @nabokihms - I'm going to close this out as a duplicate of the linked #11995. With @nabokihms help we're hoping to have the fix soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

3 participants