Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(metrics): provide metrics for tenant quotas #1094

Merged

Conversation

lukasboettcher
Copy link
Contributor

@lukasboettcher lukasboettcher commented May 24, 2024

Description

This PR adds two custom metrics for capsule tenants:

  • capsule_tenant_resource_limit{resource="<resource>",resourcequotaindex="<index>",tenant="<tenant>"}
  • capsule_tenant_resource_usage{resource="<resource>",resourcequotaindex="<index>",tenant="<tenant>"}

Usecase

When resourcequotas are configured via capsule at the Tenant scope, capacity planning is difficult via Prometheus metrics from i.e. kube-state-metrics, since the sum of the resourcequotas is not actually what's being enforced. Instead we can provide metrics that expose the aggregated resource limits and usage for a tenant.

Example metrics

Tenant Resource
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
  name: test
spec:
  owners:
  - name: alice
    kind: User
  namespaceOptions:
    quota: 10
  resourceQuotas:
    scope: Tenant
    items:
    - hard:
        pods: 100
    - hard:
        limits.memory: 4Gi
        requests.memory: 4Gi
    - hard:
        requests.memory: 6Gi
Metrics
# HELP capsule_tenant_resource_limit Current resource limit for a given resource in a tenant
# TYPE capsule_tenant_resource_limit gauge
capsule_tenant_resource_limit{resource="limits.memory",resourcequotaindex="1",tenant="test"} 4.294967296e+09
capsule_tenant_resource_limit{resource="namespaces",resourcequotaindex="",tenant="test"} 10
capsule_tenant_resource_limit{resource="pods",resourcequotaindex="0",tenant="test"} 100
capsule_tenant_resource_limit{resource="requests.memory",resourcequotaindex="1",tenant="test"} 4.294967296e+09
capsule_tenant_resource_limit{resource="requests.memory",resourcequotaindex="2",tenant="test"} 6.442450944e+09
# HELP capsule_tenant_resource_usage Current resource usage for a given resource in a tenant
# TYPE capsule_tenant_resource_usage gauge
capsule_tenant_resource_usage{resource="limits.memory",resourcequotaindex="1",tenant="test"} 2.68435456e+09
capsule_tenant_resource_usage{resource="namespaces",resourcequotaindex="",tenant="test"} 4
capsule_tenant_resource_usage{resource="pods",resourcequotaindex="0",tenant="test"} 20
capsule_tenant_resource_usage{resource="requests.memory",resourcequotaindex="1",tenant="test"} 2.68435456e+09
capsule_tenant_resource_usage{resource="requests.memory",resourcequotaindex="2",tenant="test"} 2.68435456e+09

Copy link

netlify bot commented May 24, 2024

Deploy Preview for capsule-documentation canceled.

Name Link
🔨 Latest commit 3243828
🔍 Latest deploy log https://app.netlify.com/sites/capsule-documentation/deploys/6650d95142c5530008d7cd3f

Signed-off-by: Lukas Boettcher <1340215+lukasboettcher@users.noreply.github.com>
@oliverbaehler
Copy link
Collaborator

@lukasboettcher Thanks! Wondering, did you use resourcequotaindex to reduce cardinality instead eg namespace?

@lukasboettcher
Copy link
Contributor Author

Since it is possible to create multiple resourcequotas for the same resource, I was facing a problem where the metrics were being overwritten. In the example given above, the third entry requests.memory: 6Gi would overwrite the second entry requests.memory: 4Gi in the metrics if we don't account for the index of the quota. Kubernetes itself always enforces the lowest quota, so we need to keep metrics for all tnt.spec.resourceQuotas.items.*. I did not use namespace as a label for the metrics, because they are tenant scoped and should be independent of the namespaces.
Metrics for the individual resourcequotas are already computed by i.e. kube-state-metrics.

@oliverbaehler oliverbaehler merged commit 5efb4fb into projectcapsule:main May 27, 2024
25 checks passed
@oliverbaehler
Copy link
Collaborator

Thanks, just fyi we are also working on improving the observability of the tenant resource quota and some kind of mechanism to avoid the racing conditions. One measure is to expose the usage on the tenant spec:

  status:
    namespaces:
    - green-prod
    - green-test
    quota:
      hard:
        limits.cpu: "2"
        limits.memory: 2Gi
        pods: "6"
        requests.cpu: "1"
        requests.memory: 1Gi
      used:
        limits.cpu: 400m
        limits.memory: 1Gi
        pods: "2"
        requests.cpu: 200m
        requests.memory: 256Mi

@lukasboettcher lukasboettcher deleted the feature/capsule-metrics branch May 27, 2024 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants