Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: podmonitor mTLS mutations #566

Merged
merged 7 commits into from
Jul 12, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 25 additions & 11 deletions docs/configuration/uds-monitoring-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@ weight: 1

UDS Core leverages Pepr to handle setup of Prometheus scraping metrics endpoints, with the particular configuration necessary to work in a STRICT mTLS (Istio) environment. We handle this via a default scrapeClass in prometheus to add the istio certs. When a monitor needs to be exempt from that tlsConfig a mutation is performed to leverage a plain scrape class without istio certs.

## Mutations
## TLS Configuration Setup

Note: The below implementation has been deprecated in favor of a default `scrapeClass` with the file-based `tlsConfig` required for istio mTLS in prometheus automatically, supplemented with a mutation of `scrapeClass: exempt` that exempts monitors from the `tlsConfig` required for istio if the destination namespace is not istio injected (e.g. kube-system), unless the `uds/skip-sm-mutate` annotation is specified. The mutation behavior stated in the paragraph immediately below this section will be removed in a later release.
Generally it is beneficial to use service and pod monitor resources from existing helm charts where possible as these may have more advanced configuration and options. The UDS monitoring setup ensures that all monitoring resources use a default [`scrapeClass`](https://github.com/prometheus-operator/prometheus-operator/blob/v0.75.1/Documentation/api.md#monitoring.coreos.com/v1.ScrapeClass) configured in Prometheus to handle the necessary `tlsConfig` setup for metrics to work in STRICT Istio mTLS environments (the `scheme` is also mutated to `https` on individual monitor endpoints, see [this doc](https://istio.io/latest/docs/ops/integrations/prometheus/#tls-settings) for details). This setup is the default configuration but individual monitors can opt out of this config in 3 different ways:

All service monitors are mutated to set the scrape scheme to HTTPS and set the TLS Config to what is required for Istio mTLS scraping (see [this doc](https://istio.io/latest/docs/ops/integrations/prometheus/#tls-settings) for details). Beyond this, no other fields are mutated. Supporting existing service monitors is useful since some charts include service monitors by default with more advanced configurations, and it is in our best interest to enable those and use them where possible.
1. If the service or pod monitor targets namespaces that are not Istio injected (ex: `kube-system`), Pepr will detect this and mutate these monitors to use an `exempt` scrape class that does not have the Istio certs. Assumptions are made about STRICT mTLS here for simplicity, based on the `istio-injection` namespace label. Without making these assumptions we would need to query `PeerAuthentication` resources or another resource to determine the exact workload mTLS posture.
1. Individual monitors can explicitly set the `exempt` scrape class to opt out of the Istio certificate configuration. This should typically only be done if your service exposes metrics on a PERMISSIVE mTLS port.
1. If setting a `scrapeClass` is not an option due to lack of configuration in a helm chart, or for other reasons, monitors can use the `uds/skip-mutate` annotation (with any value) to have Pepr mutate the `exempt` scrape class onto the monitor.

Assumptions are made about STRICT mTLS here for simplicity, based on the `istio-injection` namespace label. Without making these assumptions we would need to query `PeerAuthentication` resources or another resource to determine the exact workload mTLS posture.

Note: This mutation is the default behavior for all service monitors but can be skipped using the annotation key `uds/skip-sm-mutate` (with any value). Skipping this mutation should only be done if your service exposes metrics on a PERMISSIVE mTLS port.
Note: There is a deprecated functionality in Pepr that will mutate `tlsConfig` onto individual service monitors, rather than using the scrape class approach. This has been kept in the current code temporarily to prevent any metrics downtime during the switch to `scrapeClass`. In a future release this behavior will be removed to reduce the complexity of the setup and required mutations.

## Package CR `monitor` field

Expand All @@ -24,6 +24,7 @@ UDS Core also supports generating `ServiceMonitors` and/or `PodMonitors` from th
...
spec:
monitor:
# Example Service Monitor
- selector: # Selector for the service to monitor
app: foobar
portName: metrics # Name of the port to monitor
Expand All @@ -40,17 +41,30 @@ spec:
name: "example-secret"
optional: false
type: "Bearer"
# Example Pod Monitor
- portName: metrics # Name of the port on the pod to monitor
targetPort: 1234 # Corresponding target port on the pod/container (for network policy)
selector: # Selector for pod(s) to monitor; note: pod monitors support `podSelector` as well, both options behave the same
app: barfoo
kind: PodMonitor
# Optional properties depending on your application
description: "Metrics" # Add to customize the pod monitor name
path: "/mymetrics" # Add if metrics are exposed on a different path than "/metrics"
authorization: # Add if authorization is required for the metrics endpoint
credentials:
key: "example-key"
name: "example-secret"
optional: false
type: "Bearer"
```

This config is used to generate service monitors and corresponding network policies to setup scraping for your applications. The `ServiceMonitor`s will go through the mutation process to add `tlsConfig` and `scheme` to work in an istio environment.

This spec intentionally does not support all options available with a `ServiceMonitor`. While we may add additional fields in the future, we do not want to simply rebuild the `ServiceMonitor` spec since mutations are already available to handle Istio specifics. The current subset of spec options is based on the bare minimum necessary to craft resources.
This config is used to generate service or pod monitors and corresponding network policies to setup scraping for your applications. The aforementioned TLS configuration will also apply to these generated monitors, setting a default scrape class unless target namespaces are non-istio-injected.

NOTE: While this is a rather verbose spec, each of the above fields are strictly required to craft the necessary service monitor and network policy resources.
This spec intentionally does not support all options available with a `PodMonitor` or `ServiceMonitor`. While we may add additional fields in the future, we do not want to simply rebuild these specs since we are handling the complexities of Istio mTLS metrics. The current subset of spec options is based on the common needs seen in most environments.

## Notes on Alternative Approaches

In coming up with this feature a few alternative approaches were considered but not chosen due to issues with each one. The current spec provides the best balance of a simplified interface compared to the `ServiceMonitor` spec, and a faster/easier reconciliation loop.
In coming up with this feature when targeting the `ServiceMonitor` use case a few alternative approaches were considered but not chosen due to issues with each one. The current spec provides the best balance of a simplified interface compared to the `ServiceMonitor` spec, and a faster/easier reconciliation loop.

### Generation based on service lookup

Expand Down
7 changes: 7 additions & 0 deletions src/grafana/tasks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,13 @@ tasks:
name: "app.kubernetes.io/instance=grafana"
namespace: grafana
condition: Ready
# This test is only functional on full uds-core deploys
mjnagel marked this conversation as resolved.
Show resolved Hide resolved
# - description: Validate grafana servicemonitor exists
# wait:
# cluster:
# kind: ServiceMonitor
# name: grafana-metrics
# namespace: grafana
- description: Validate grafana virtual service
cmd: |
if [ "$(curl -isS https://grafana.admin.uds.dev --output /dev/null -w '%{http_code}')" = "302" ]; then
Expand Down
4 changes: 2 additions & 2 deletions src/pepr/operator/controllers/monitoring/pod-monitor.ts
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ export function generatePodMonitor(
generation: string,
ownerRefs: V1OwnerReference[],
) {
const { selector, portName } = monitor;
const { selector, podSelector, portName } = monitor;
const name = generateMonitorName(pkgName, monitor);
const payload: PrometheusPodMonitor = {
metadata: {
Expand All @@ -79,7 +79,7 @@ export function generatePodMonitor(
},
],
selector: {
matchLabels: selector,
matchLabels: podSelector ?? selector,
},
},
};
Expand Down
8 changes: 4 additions & 4 deletions src/pepr/operator/controllers/network/policies.ts
Original file line number Diff line number Diff line change
Expand Up @@ -98,13 +98,13 @@ export async function networkPolicies(pkg: UDSPackage, namespace: string) {
policies.push(keycloakGeneratedPolicy);
}

// Generate NetworkPolicies for any ServiceMonitors that are generated
// Generate NetworkPolicies for any monitors that are generated
const monitorList = pkg.spec?.monitor ?? [];
// Iterate over each ServiceMonitor
// Iterate over each monitor
for (const monitor of monitorList) {
const { selector, targetPort, podSelector } = monitor;

// Create the NetworkPolicy for the ServiceMonitor
// Create the NetworkPolicy for the monitor
const policy: Allow = {
direction: Direction.Ingress,
selector: podSelector ?? selector,
Expand All @@ -113,7 +113,7 @@ export async function networkPolicies(pkg: UDSPackage, namespace: string) {
app: "prometheus",
},
port: targetPort,
// Use the targetPort and selector to generate a description for the ServiceMonitor derived policies
// Use the targetPort and selector to generate a description for the monitoring derived policies
description: `${targetPort}-${Object.values(selector)} Metrics`,
};
// Generate the policy
Expand Down
1 change: 1 addition & 0 deletions src/pepr/operator/crd/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ export {
} from "./generated/istio/serviceentry-v1beta1";

export {
PodMetricsEndpoint as PodMonitorEndpoint,
Scheme as PodMonitorScheme,
PodMonitor as PrometheusPodMonitor,
} from "./generated/prometheus/podmonitor-v1";
Expand Down
80 changes: 52 additions & 28 deletions src/pepr/prometheus/index.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
import { Capability, K8s, kind } from "pepr";
import { Component, setupLogger } from "../logger";
import {
PodMonitorEndpoint,
PodMonitorScheme,
PrometheusPodMonitor,
PrometheusServiceMonitor,
ServiceMonitorEndpoint,
ServiceMonitorScheme,
Expand All @@ -17,60 +20,81 @@ export const prometheus = new Capability({
const { When } = prometheus;

/**
* Mutate a service monitor to enable mTLS metrics
* Mutate a service monitor to exclude it from mTLS metrics with `exempt` scrapeClass
*/
When(PrometheusServiceMonitor)
.IsCreatedOrUpdated()
.Mutate(async sm => {
// Provide an opt-out of mutation to handle complicated scenarios
if (sm.Raw.metadata?.annotations?.["uds/skip-sm-mutate"]) {
if (sm.Raw.spec === undefined) {
return;
}

// Add an exempt scrape class if explicitly opted out via annotation OR targeting a non-istio-injected namespace
if (
sm.Raw.metadata?.annotations?.["uds/skip-mutate"] ||
sm.Raw.metadata?.annotations?.["uds/skip-sm-mutate"] ||
!(await isIstioInjected(sm))
) {
log.info(
`Mutating scrapeClass to exempt ServiceMonitor ${sm.Raw.metadata?.name} from default scrapeClass mTLS config`,
);
if (sm.Raw.spec === undefined) {
return;
}
sm.Raw.spec.scrapeClass = "exempt";
return;
}

// This assumes istio-injection == strict mTLS due to complexity around mTLS lookup
if (await isIstioInjected(sm)) {
if (sm.Raw.spec?.endpoints === undefined) {
return;
}
/**
* Patching ServiceMonitor tlsConfig is deprecated in favor of default scrapeClass with tls config
* this mutation will be removed in favor of a mutation to opt-out of the default scrapeClass in the future
*/
} else {
log.info(`Patching service monitor ${sm.Raw.metadata?.name} for mTLS metrics`);
// Note: this tlsConfig patch is deprecated in favor of a default scrape class for both service and pod monitors
const tlsConfig = {
caFile: "/etc/prom-certs/root-cert.pem",
certFile: "/etc/prom-certs/cert-chain.pem",
keyFile: "/etc/prom-certs/key.pem",
insecureSkipVerify: true,
};
const endpoints: ServiceMonitorEndpoint[] = sm.Raw.spec.endpoints;
const endpoints: ServiceMonitorEndpoint[] = sm.Raw.spec.endpoints || [];
endpoints.forEach(endpoint => {
endpoint.scheme = ServiceMonitorScheme.HTTPS;
endpoint.tlsConfig = tlsConfig;
});
sm.Raw.spec.endpoints = endpoints;
} else {
}
});

/**
* Mutate a pod monitor to exclude it from mTLS metrics with `exempt` scrapeClass
*/
When(PrometheusPodMonitor)
.IsCreatedOrUpdated()
.Mutate(async pm => {
if (pm.Raw.spec === undefined) {
return;
}

// Add an exempt scrape class if explicitly opted out via annotation OR targeting a non-istio-injected namespace
if (pm.Raw.metadata?.annotations?.["uds/skip-mutate"] || !(await isIstioInjected(pm))) {
log.info(
`Mutating scrapeClass to exempt ServiceMonitor ${sm.Raw.metadata?.name} from default scrapeClass mTLS config`,
`Mutating scrapeClass to exempt PodMonitor ${pm.Raw.metadata?.name} from default scrapeClass mTLS config`,
);
if (sm.Raw.spec === undefined) {
return;
}
sm.Raw.spec.scrapeClass = "exempt";
pm.Raw.spec.scrapeClass = "exempt";
return;
} else {
log.info(`Patching pod monitor ${pm.Raw.metadata?.name} for mTLS metrics`);
const endpoints: PodMonitorEndpoint[] = pm.Raw.spec.podMetricsEndpoints || [];
endpoints.forEach(endpoint => {
endpoint.scheme = PodMonitorScheme.HTTPS;
});
pm.Raw.spec.podMetricsEndpoints = endpoints;
}
});

async function isIstioInjected(sm: PrometheusServiceMonitor) {
const namespaces = sm.Raw.spec?.namespaceSelector?.matchNames || [sm.Raw.metadata?.namespace] || [
"default",
];
// This assumes istio-injection == strict mTLS due to complexity around mTLS lookup
async function isIstioInjected(monitor: PrometheusServiceMonitor | PrometheusPodMonitor) {
// If monitor allows any namespace assume istio injection
if (monitor.Raw.spec?.namespaceSelector?.any) {
return true;
}

const namespaces = monitor.Raw.spec?.namespaceSelector?.matchNames || [
monitor.Raw.metadata?.namespace,
] || ["default"];

for (const ns of namespaces) {
const namespace = await K8s(kind.Namespace).Get(ns);
Expand Down
3 changes: 2 additions & 1 deletion src/prometheus-stack/chart/templates/istio-monitor.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,9 @@ kind: PodMonitor
metadata:
name: envoy-stats-monitor
namespace: istio-system
annotations:
uds/skip-mutate: "true"
spec:
scrapeClass: exempt
selector:
matchExpressions:
- {key: istio-prometheus-ignore, operator: DoesNotExist}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,9 @@ kind: PodMonitor
metadata:
name: prometheus-pod-monitor
namespace: monitoring
annotations:
uds/skip-mutate: "true"
spec:
scrapeClass: exempt
selector:
matchLabels:
app: prometheus
Expand Down
78 changes: 0 additions & 78 deletions src/test/app-tenant.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,43 +3,6 @@ kind: Namespace
metadata:
name: test-tenant-app
---
apiVersion: v1
kind: Secret
metadata:
name: example-secret
namespace: test-tenant-app
type: Opaque
data:
example-key: ZXhhbXBsZS1rZXk=
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: httpbin-pod-monitor-default-scrape
namespace: test-tenant-app
spec:
podMetricsEndpoints:
- path: /metrics
port: service
scrapeClass: istio-certs
selector:
matchLabels:
app: httpbin
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: httpbin-pod-monitor-no-tls-config
namespace: test-tenant-app
spec:
podMetricsEndpoints:
- path: /metrics
port: service
scrapeClass: exempt
selector:
matchLabels:
app: httpbin
---
apiVersion: uds.dev/v1alpha1
kind: Package
metadata:
Expand All @@ -60,47 +23,6 @@ spec:
gateway: tenant
host: demo-8081
port: 8081
monitor:
- selector:
app: httpbin
targetPort: 3000
portName: service
description: Pod Monitor
kind: PodMonitor
- selector:
app: httpbin
targetPort: 3000
portName: service
description: Service Monitor Explicit
kind: ServiceMonitor
- selector:
app: httpbin
targetPort: 3000
portName: service
description: Service Monitor Default
- portName: "http"
selector:
app: "example"
targetPort: 8080
kind: "PodMonitor"
authorization:
credentials:
key: "example-key"
name: "example-secret"
optional: false
type: "Bearer"
description: Pod Monitor with Authorization
- portName: "http"
selector:
app: "example"
targetPort: 8080
authorization:
credentials:
key: "example-key"
name: "example-secret"
optional: false
type: "Bearer"
description: Service Monitor with Authorization
---
apiVersion: v1
kind: Service
Expand Down
2 changes: 1 addition & 1 deletion src/test/chart/Chart.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
apiVersion: v2
name: exempted-app
name: uds-podinfo-config
description: A Helm chart for testing an exempted-app
type: application
version: 0.1.0
20 changes: 20 additions & 0 deletions src/test/chart/templates/package.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
apiVersion: uds.dev/v1alpha1
kind: Package
metadata:
name: podinfo
namespace: podinfo
spec:
monitor:
- selector:
app.kubernetes.io/name: podinfo
targetPort: 9898
portName: http
kind: PodMonitor
network:
expose:
- service: podinfo
selector:
app.kubernetes.io/name: podinfo
gateway: tenant
host: podinfo
port: 9898
Loading