Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datadog Integration (#3407) #3619

Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .changelog/3407.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
```release-note:feature
helm: introduces `global.metrics.datadog` overrides to streamline consul-k8s datadog integration.
helm: introduces `server.enableAgentDebug` to expose agent [`enable_debug`](https://developer.hashicorp.com/consul/docs/agent/config/config-files#enable_debug) configuration.
helm: introduces `global.metrics.disableAgentHostName` to expose agent [`telemetry.disable_hostname`](https://developer.hashicorp.com/consul/docs/agent/config/config-files#telemetry-disable_hostname) configuration.
helm: introduces `global.metrics.enableHostMetrics` to expose agent [`telemetry.enable_host_metrics`](https://developer.hashicorp.com/consul/docs/agent/config/config-files#telemetry-enable_host_metrics) configuration.
helm: introduces `global.metrics.prefixFilter` to expose agent [`telemetry.prefix_filter`](https://developer.hashicorp.com/consul/docs/agent/config/config-files#telemetry-prefix_filter) configuration.
helm: introduces `global.metrics.datadog.dogstatsd.dogstatsdAddr` to expose agent [`telemetry.dogstatsd_addr`](https://developer.hashicorp.com/consul/docs/agent/config/config-files#telemetry-dogstatsd_addr) configuration.
helm: introduces `global.metrics.datadog.dogstatsd.dogstatsdTags` to expose agent [`telemetry.dogstatsd_tags`](https://developer.hashicorp.com/consul/docs/agent/config/config-files#telemetry-dogstatsd_tags) configuration.
helm: introduces required `ad.datadoghq.com/` annotations and `tags.datadoghq.com/` labels for integration with [Datadog Autodiscovery](https://docs.datadoghq.com/integrations/consul/?tab=containerized) and [Datadog Unified Service Tagging](https://docs.datadoghq.com/getting_started/tagging/unified_service_tagging/?tab=kubernetes#serverless-environment) for Consul.
helm: introduces automated unix domain socket hostPath mounting for containerized integration with datadog within consul-server statefulset.
helm: introduces `global.metrics.datadog.otlp` override options to allow OTLP metrics forwarding to Datadog Agent.
control-plane: adds `server-acl-init` datadog agent token creation for datadog integration.
```
203 changes: 185 additions & 18 deletions charts/consul/templates/_helpers.tpl

Large diffs are not rendered by default.

38 changes: 38 additions & 0 deletions charts/consul/templates/datadog-agent-role.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
{{- if .Values.global.metrics.datadog.enabled }}
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: {{ template "consul.fullname" . }}-datadog-metrics
namespace: {{ .Release.Namespace }}
labels:
app: datadog
heritage: {{ .Release.Service }}
release: {{ .Release.Name }}
component: agent
{{- if (or (and .Values.global.openshift.enabled .Values.server.exposeGossipAndRPCPorts) .Values.global.enablePodSecurityPolicies) }}
{{- if .Values.global.enablePodSecurityPolicies }}
rules:
- apiGroups: ["policy"]
resources: ["podsecuritypolicies"]
resourceNames:
- {{ template "consul.fullname" . }}-datadog-metrics
verbs:
- use
{{- end }}
{{- if (and .Values.global.openshift.enabled .Values.server.exposeGossipAndRPCPorts ) }}
- apiGroups: ["security.openshift.io"]
resources: ["securitycontextconstraints"]
resourceNames:
- {{ template "consul.fullname" . }}-datadog-metrics
verbs:
- use
{{- end }}
{{- else}}
rules:
- apiGroups: [ "" ]
resources: [ "secrets" ]
resourceNames:
- {{ .Release.Namespace }}-datadog-agent-metrics-acl-token
verbs: [ "get", "watch", "list" ]
{{- end }}
{{- end }}
26 changes: 26 additions & 0 deletions charts/consul/templates/datadog-agent-rolebinding.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
{{- if .Values.global.metrics.datadog.enabled }}
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: {{ template "consul.fullname" . }}-datadog-metrics
namespace: {{ .Release.Namespace }}
labels:
app: {{ template "consul.name" . }}
chart: {{ template "consul.chart" . }}
heritage: {{ .Release.Service }}
release: {{ .Release.Name }}
component: agent
subjects:
- kind: ServiceAccount
apiGroup: ""
name: datadog-agent
namespace: datadog
- kind: ServiceAccount
apiGroup: ""
name: datadog-cluster-agent
namespace: datadog
roleRef:
kind: Role
name: {{ template "consul.fullname" . }}-datadog-metrics
apiGroup: ""
{{- end }}
4 changes: 4 additions & 0 deletions charts/consul/templates/server-acl-init-job.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -273,6 +273,10 @@ spec:
-create-enterprise-license-token=true \
{{- end }}

{{- if (and (not .Values.global.metrics.datadog.dogstatsd.enabled) .Values.global.metrics.datadog.enabled .Values.global.acls.manageSystemACLs) }}
-create-dd-agent-token=true \
{{- end }}

{{- if .Values.server.snapshotAgent.enabled }}
-snapshot-agent=true \
{{- end }}
Expand Down
16 changes: 14 additions & 2 deletions charts/consul/templates/server-config-configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ data:
{{- if .Values.server.logLevel }}
"log_level": "{{ .Values.server.logLevel | upper }}",
{{- end }}
"enable_debug": {{ .Values.server.enableAgentDebug }},
"domain": "{{ .Values.global.domain }}",
"limits": {
"request_limits": {
Expand All @@ -56,7 +57,12 @@ data:
"enabled": true
},
{{- end }}
"server": true
"server": true,
"leave_on_terminate": true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: These extra 'leave_on_terminate' and 'autopilot' settings should be removed as they were deemed destructive.

We need to check the other backports as anything from #3000 should not be in release/1.3.x, release/1.2.x and release/1.1.x (1.4.x is fine)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected as recommended by reverting back to release/1.3.x branch version of affected files.

$ git checkout 'release/1.3.x' -- charts/consul/templates/server-config-configmap.yaml

Re-applied datadog-integration changes into the following files:

  • charts/consul/templates/server-config-configmap.yaml
    • Reincorporated enable_debug into server.json (updates server-statefulset.yaml config-checksum)
    • Reapplied all datadog and agent metric-related entries into the telemetry-config.json
  • charts/consul/test/unit/server-statefulset.bats
    • Updated config-configmap tests to reflect enable_debug update to server.json config
      • "server/StatefulSet: adds config-checksum annotation when extraConfig is blank"
      • "server/StatefulSet: adds config-checksum annotation when extraConfig is provided"
      • "server/StatefulSet: adds config-checksum annotation when extraConfig is updated"

"autopilot": {
"min_quorum": {{ template "consul.server.autopilotMinQuorum" . }},
"disable_upgrade_migration": true
}
}
{{- $vaultConnectCAEnabled := and .Values.global.secretsBackend.vault.connectCA.address .Values.global.secretsBackend.vault.connectCA.rootPKIPath .Values.global.secretsBackend.vault.connectCA.intermediatePKIPath -}}
{{- if and .Values.global.secretsBackend.vault.enabled $vaultConnectCAEnabled }}
Expand Down Expand Up @@ -187,7 +193,13 @@ data:
telemetry-config.json: |-
{
"telemetry": {
"prometheus_retention_time": "{{ .Values.global.metrics.agentMetricsRetentionTime }}"
"prometheus_retention_time": "{{ .Values.global.metrics.agentMetricsRetentionTime }}",
"disable_hostname": {{ .Values.global.metrics.disableAgentHostName }},{{ template "consul.prefixFilter" . }}
"enable_host_metrics": {{ .Values.global.metrics.enableHostMetrics }}{{- if .Values.global.metrics.datadog.dogstatsd.enabled }},{{ template "consul.dogstatsdAaddressInfo" . }}
{{- if .Values.global.metrics.datadog.dogstatsd.enabled }}
"dogstatsd_tags": {{ .Values.global.metrics.datadog.dogstatsd.dogstatsdTags | toJson }}
{{- end }}
{{- end }}
}
}
{{- end }}
Expand Down
2 changes: 1 addition & 1 deletion charts/consul/templates/server-disruptionbudget.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ metadata:
release: {{ .Release.Name }}
component: server
spec:
maxUnavailable: {{ template "consul.pdb.maxUnavailable" . }}
maxUnavailable: {{ template "consul.server.pdb.maxUnavailable" . }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: This is also from #3000 and should be dropped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected as recommended by reverting back to release/1.3.x branch version of affected files.

$ git checkout 'release/1.3.x' -- charts/consul/templates/server-disruptionbudget.yaml charts/consul/test/unit/server-disruptionbudget.bats charts/consul/template/_helpers.tpl

Applied datadog-integration changes back into _helpers.tpl

Re-ran entirety of bats tests using Makefile - make bats-tests (all passed)

selector:
matchLabels:
app: {{ template "consul.name" . }}
Expand Down
83 changes: 82 additions & 1 deletion charts/consul/templates/server-statefulset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@
{{- end -}}
{{ template "consul.validateRequiredCloudSecretsExist" . }}
{{ template "consul.validateCloudSecretKeys" . }}
{{ template "consul.validateMetricsConfig" . }}
{{ template "consul.validateDatadogConfiguration" . }}
{{ template "consul.validateExtraConfig" . }}
# StatefulSet to run the actual Consul server cluster.
apiVersion: apps/v1
kind: StatefulSet
Expand Down Expand Up @@ -62,6 +65,11 @@ spec:
release: {{ .Release.Name }}
component: server
hasDNS: "true"
{{- if .Values.global.metrics.datadog.enabled }}
"tags.datadoghq.com/version": {{ template "consul.versionInfo" . }}
"tags.datadoghq.com/env": {{ template "consul.name" . }}
"tags.datadoghq.com/service": "consul-server"
{{- end }}
{{- if .Values.server.extraLabels }}
{{- toYaml .Values.server.extraLabels | nindent 8 }}
{{- end }}
Expand Down Expand Up @@ -124,6 +132,7 @@ spec:
{{- tpl .Values.server.annotations . | nindent 8 }}
{{- end }}
{{- if (and .Values.global.metrics.enabled .Values.global.metrics.enableAgentMetrics) }}
{{- if not .Values.global.metrics.datadog.openMetricsPrometheus.enabled }}
"prometheus.io/scrape": "true"
"prometheus.io/path": "/v1/agent/metrics"
{{- if .Values.global.tls.enabled }}
Expand All @@ -134,6 +143,67 @@ spec:
"prometheus.io/scheme": "http"
{{- end }}
{{- end }}
{{- if .Values.global.metrics.datadog.enabled }}
"ad.datadoghq.com/tolerate-unready": "true"
"ad.datadoghq.com/consul.logs": {{ .Values.global.metrics.datadog.dogstatsd.dogstatsdTags | toJson | replace "[" "[{" | replace "]" "}]" | replace ":" "\": \"" | join "\",\"" | squote }}
{{- if .Values.global.metrics.datadog.openMetricsPrometheus.enabled }}
"ad.datadoghq.com/consul.checks": |
{
"openmetrics": {
"init_config": {},
"instances": [
{
{{- if .Values.global.tls.enabled }}
"openmetrics_endpoint": "https://consul-server.{{ .Release.Namespace }}.svc:8501/v1/agent/metrics?format=prometheus",
"tls_cert": "/etc/datadog-agent/conf.d/consul.d/certs/tls.crt",
"tls_private_key": "/etc/datadog-agent/conf.d/consul.d/certs/tls.key",
"tls_ca_cert": "/etc/datadog-agent/conf.d/consul.d/ca/tls.crt",
{{- else }}
"openmetrics_endpoint": "http://consul-server.{{ .Release.Namespace }}.svc:8500/v1/agent/metrics?format=prometheus",
{{- end }}
{{- if ( .Values.global.acls.manageSystemACLs) }}
"headers": {
"X-Consul-Token": "ENC[k8s_secret@{{ .Release.Namespace }}/{{ .Release.Namespace }}-datadog-agent-metrics-acl-token/token]"
},
{{- end }}
"namespace": "{{ .Release.Namespace }}",
"metrics": [ ".*" ]
}
]
}
}
{{- else if (not .Values.global.metrics.datadog.dogstatsd.enabled) }}
"ad.datadoghq.com/consul.checks": |
{
"consul": {
"init_config": {},
"instances": [
{
{{- if .Values.global.tls.enabled }}
"url": "https://consul-server.{{ .Release.Namespace }}.svc:8501",
"tls_cert": "/etc/datadog-agent/conf.d/consul.d/certs/tls.crt",
"tls_private_key": "/etc/datadog-agent/conf.d/consul.d/certs/tls.key",
"tls_ca_cert": "/etc/datadog-agent/conf.d/consul.d/ca/tls.crt",
{{- else }}
"url": "http://consul-server.consul.svc:8500",
{{- end }}
"use_prometheus_endpoint": true,
{{- if ( .Values.global.acls.manageSystemACLs) }}
"acl_token": "ENC[k8s_secret@{{ .Release.Namespace }}/{{ .Release.Namespace }}-datadog-agent-metrics-acl-token/token]",
{{- end }}
"new_leader_checks": true,
"network_latency_checks": true,
"catalog_checks": true,
"auth_type": "basic"
}
]
}
}
{{- else }}
"ad.datadoghq.com/consul.metrics_exclude": "true"
{{- end }}
{{- end }}
{{- end }}
spec:
{{- if .Values.server.affinity }}
affinity:
Expand Down Expand Up @@ -219,6 +289,12 @@ spec:
emptyDir:
medium: "Memory"
{{- end }}
{{- if and .Values.global.metrics.datadog.enabled .Values.global.metrics.datadog.dogstatsd.enabled (eq .Values.global.metrics.datadog.dogstatsd.socketTransportType "UDS" ) }}
- name: dsdsocket
hostPath:
path: /var/run/datadog
type: DirectoryOrCreate
{{- end }}
{{- range .Values.server.extraVolumes }}
- name: userconfig-{{ .name }}
{{ .type }}:
Expand Down Expand Up @@ -257,7 +333,7 @@ spec:
{{- include "consul.restrictedSecurityContext" . | nindent 8 }}
containers:
- name: consul
image: "{{ default .Values.global.image .Values.server.image }}"
image: "{{ default .Values.global.image .Values.server.image | trimPrefix "\"" | trimSuffix "\"" }}"
imagePullPolicy: {{ .Values.global.imagePullPolicy }}
env:
- name: ADVERTISE_IP
Expand Down Expand Up @@ -455,6 +531,11 @@ spec:
mountPath: /consul/license
readOnly: true
{{- end }}
{{- if and .Values.global.metrics.datadog.enabled .Values.global.metrics.datadog.dogstatsd.enabled (eq .Values.global.metrics.datadog.dogstatsd.socketTransportType "UDS" ) }}
- name: dsdsocket
mountPath: /var/run/datadog
readOnly: true
{{- end }}
{{- range .Values.server.extraVolumes }}
- name: userconfig-{{ .name }}
readOnly: true
Expand Down
13 changes: 13 additions & 0 deletions charts/consul/templates/telemetry-collector-deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -248,6 +248,19 @@ spec:
- name: SSL_CERT_DIR
value: "/etc/ssl/certs:/trusted-cas"
{{- end }}
{{- if .Values.global.metrics.datadog.otlp.enabled }}
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
{{- if eq (.Values.global.metrics.datadog.otlp.protocol | lower ) "http" }}
- name: CO_OTEL_HTTP_ENDPOINT
value: "http://$(HOST_IP):4318"
{{- else if eq (.Values.global.metrics.datadog.otlp.protocol | lower) "grpc" }}
- name: CO_OTEL_HTTP_ENDPOINT
value: "grpc://$(HOST_IP):4317"
{{- end }}
{{- end }}
{{- include "consul.extraEnvironmentVars" .Values.telemetryCollector | nindent 12 }}
command:
- "/bin/sh"
Expand Down
16 changes: 2 additions & 14 deletions charts/consul/test/unit/helpers.bats
Original file line number Diff line number Diff line change
Expand Up @@ -348,7 +348,7 @@ load _helpers
[[ "$output" =~ "When the value global.experiments.resourceAPIs is set, global.peering.enabled is currently unsupported." ]]
}

@test "connectInject/Deployment: fails if resource-apis is set and admin partitions are enabled" {
@test "connectInject/Deployment: fails if resource-apis is set, v2tenancy is unset, and admin partitions are enabled" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Looks like extra stuff picked up. git checkout 'release/1.3.x' helpers.bats will allow you to reset the file to the branch it is from.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected as recommended by reverting back to release/1.3.x branch version of affected files.

$ git checkout 'release/1.3.x' -- charts/consul/templates/server-disruptionbudget.yaml charts/consul/test/unit/server-disruptionbudget.bats charts/consul/template/_helpers.tpl

Applied datadog-integration changes back into _helpers.tpl

Re-ran entirety of bats tests using Makefile - make bats-tests (all passed)

cd `chart_dir`
run helm template \
-s templates/tests/test-runner.yaml \
Expand All @@ -359,7 +359,7 @@ load _helpers
--set 'global.adminPartitions.enabled=true' \
.
[ "$status" -eq 1 ]
[[ "$output" =~ "When the value global.experiments.resourceAPIs is set, global.adminPartitions.enabled is currently unsupported." ]]
[[ "$output" =~ "When the value global.experiments.resourceAPIs is set, global.experiments.v2tenancy must also be set to support global.adminPartitions.enabled." ]]
}

@test "connectInject/Deployment: fails if resource-apis is set and federation is enabled" {
Expand Down Expand Up @@ -431,18 +431,6 @@ load _helpers
[[ "$output" =~ "When the value global.experiments.resourceAPIs is set, syncCatalog.enabled is currently unsupported." ]]
}

@test "connectInject/Deployment: fails if resource-apis is set and meshGateway is enabled" {
cd `chart_dir`
run helm template \
-s templates/tests/test-runner.yaml \
--set 'connectInject.enabled=true' \
--set 'global.experiments[0]=resource-apis' \
--set 'ui.enabled=false' \
--set 'meshGateway.enabled=true' .
[ "$status" -eq 1 ]
[[ "$output" =~ "When the value global.experiments.resourceAPIs is set, meshGateway.enabled is currently unsupported." ]]
}

@test "connectInject/Deployment: fails if resource-apis is set and ingressGateways is enabled" {
cd `chart_dir`
run helm template \
Expand Down
49 changes: 49 additions & 0 deletions charts/consul/test/unit/server-acl-init-job.bats
Original file line number Diff line number Diff line change
Expand Up @@ -2444,3 +2444,52 @@ load _helpers
yq 'any(contains("-enable-resource-apis=true"))' | tee /dev/stderr)
[ "${actual}" = "true" ]
}

#--------------------------------------------------------------------
# global.metrics.datadog

@test "serverACLInit/Job: -create-dd-agent-token not set when datadog=false and manageSystemACLs=true" {
cd `chart_dir`
local command=$(helm template \
-s templates/server-acl-init-job.yaml \
--set 'global.acls.manageSystemACLs=true' \
. | tee /dev/stderr |
yq '.spec.template.spec.containers[0].command' | tee /dev/stderr)

local actual=$( echo "$command" |
yq 'any(contains("-create-dd-agent-token"))' | tee /dev/stderr)
[ "${actual}" = "false" ]
}

@test "serverACLInit/Job: -create-dd-agent-token set when global.metrics.datadog=true and global.acls.manageSystemACLs=true" {
cd `chart_dir`
local command=$(helm template \
-s templates/server-acl-init-job.yaml \
--set 'global.metrics.enabled=true' \
--set 'global.metrics.enableAgentMetrics=true' \
--set 'global.metrics.datadog.enabled=true' \
--set 'global.acls.manageSystemACLs=true' \
. | tee /dev/stderr |
yq '.spec.template.spec.containers[0].command' | tee /dev/stderr)

local actual=$( echo "$command" |
yq 'any(contains("-create-dd-agent-token"))' | tee /dev/stderr)
[ "${actual}" = "true" ]
}

@test "serverACLInit/Job: -create-dd-agent-token NOT set when global.metrics.datadog=true, global.metrics.datadog.dogstatsd.enabled=true, and global.acls.manageSystemACLs=true" {
cd `chart_dir`
local command=$(helm template \
-s templates/server-acl-init-job.yaml \
--set 'global.metrics.enabled=true' \
--set 'global.metrics.enableAgentMetrics=true' \
--set 'global.metrics.datadog.enabled=true' \
--set 'global.metrics.datadog.dogstatsd.enabled=true' \
--set 'global.acls.manageSystemACLs=true' \
. | tee /dev/stderr |
yq '.spec.template.spec.containers[0].command' | tee /dev/stderr)

local actual=$( echo "$command" |
yq 'any(contains("-create-dd-agent-token"))' | tee /dev/stderr)
[ "${actual}" = "false" ]
}
Loading
Loading