Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vault telemetry double counting route metrics #12060

Closed
mikegreen opened this issue Jul 13, 2021 · 5 comments · Fixed by #12763
Closed

Vault telemetry double counting route metrics #12060

mikegreen opened this issue Jul 13, 2021 · 5 comments · Fixed by #12763

Comments

@mikegreen
Copy link
Contributor

Describe the bug
A single read/write or get/put to a Vault secret results in the telemetry metric count incrementing by 2, instead of 1.

To Reproduce
Steps to reproduce the behavior:

  1. Enable telemetry, in my case Prometheus but I feel this affects all telemetry methods (https://www.vaultproject.io/docs/configuration/telemetry#prometheus)
  2. Get current count/baseline of metric - view metric in Prometheus vault_route_update_transit__count , ie, it is 2
  3. Perform a secrets operation, ie a transit encrypt: vault write -tls-skip-verify transit/encrypt/key01 plaintext=$(base64 <<< "encrypt stuff")
  4. View metrics in Prometheus vault_route_update_transit__count, now it is 4

Expected behavior
A single operation, such as a transit encrypt operation, would be counted as 1

Environment:

  • Vault Server Version (retrieve with vault status): 1.7.3+ent, but also seen in 1.6.x
  • Vault CLI Version (retrieve with vault version): Vault v1.7.3 (5d517c8)
  • Server Operating System/Architecture: ubuntu 18, but appears to be on all

Vault server configuration file(s):

ui=true
disable_mlock = true

storage "raft" {
 path = "/opt/vault/"
 node_id = "raft_01"
}

api_addr     = "https://192.168.1.215:8200"
cluster_addr = "https://192.168.1.215:8201"

listener "tcp" {
  address       = "192.168.1.215:8200"
  tls_cert_file = "/opt/vault/tls.crt"
  tls_key_file  = "/opt/vault/tls.key"
  tls_disable_client_certs = true
}

seal "awskms" {
redacted
}

telemetry {
  prometheus_retention_time = "24h"
  disable_hostname = true
  usage_gauge_period = "30s"
}  

default_lease_ttl = "72h"

enable_response_header_hostname = true
enable_response_header_raft_node_id = true

Additional context
Add any other context about the problem here.

@ncabatoff
Copy link
Collaborator

My hypothesis is that this only affects writes and is due to the ExistenceCheck (used to classify writes as "create" or "update") is triggering the extra counter increment since it leverages the routeCommon func that modifies the route metric. An easy fix would be to only increment the counter when !existenceCheck.

@heatherezell
Copy link
Contributor

Hi folks! Any issue with a good-first-issue label is one where we particularly encourage community contributions, so please feel free to take a stab at implementing this if you wish.

@vinayak03
Copy link
Contributor

@hsimon-hashicorp I would like to signup for this issue.

@heatherezell
Copy link
Contributor

@vinayak03 Please feel free to create a pull request! We're here for help with questions on implementation or things you're stuck with, too. Thanks! :)

@vinayak03
Copy link
Contributor

@hsimon-hashicorp I have created a pull request, I have tested the issue on my local environment, however I was not able to add unit test for this. Let me know if there is any other way I can add test for this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants