This is the technical documentation for the Datadog Cluster Agent image, available here.
Review the RBAC files in the manifests folder to get the full scope of the requirements. These manifests create a Service Account, a Cluster Role with a restricted scope and actions detailed below and a Cluster Role Binding as well.
get
,list
andwatch
ofComponenentstatuses
to produce the controle plane service checks.get
andupdate
of theConfigmaps
nameddatadogtoken
to update and query the most up to date version token corresponding to the latest event stored in ETCD.watch
theServices
to perform the Autodiscovery based off of services activity.get
,list
andwatch
of thePods
.get
,list
andwatch
of theNodes
.get
,list
andwatch
of theServices
.get
,list
andwatch
of theEvents
.get
,list
andwatch
of theEndpoints
to run cluster level health checks.watch
andlist
of thehorizontalpodautoscalers
to run the AutoscalersController and serve custom metrics.
To store the event.tokenKey
and the event.tokenTimestamp
, deploy your ConfigMap in the same namespace as the cluster agent with the name datadogtoken
, namespace can be configured otherwise with DD_KUBE_RESOURCES_NAMESPACE
.
For this, run kubectl create configmap datadogtoken --from-literal="event.tokenKey"="0"
.
NB:Set any resversion here, make sure it's not set to a value superior to the actual current resversion.
If not present, set the event.tokenTimestamp
, it is automatically set.
Run the Datadog Cluster Agent in Kubernetes using this manifest. Refer to the getting started guide for more details.
For the Datadog Cluster Agent to communicate with the Node Agent, you need to share an authentication token between the two agents.
The token needs to be greater or equal to 32 characters and should only have upper case or lower case letters and numbers.
You can pass the token as an environment variable: DD_CLUSTER_AGENT_AUTH_TOKEN
.
Besides this token, you need to set the DD_CLUSTER_AGENT_ENABLED=true
in the manifest of the Datadog Node Agent.
We strongly recommend using a secret to authenticate communication between Agents with the Datadog Cluster Agent. You must modify the value of the secret in the dca-secret.yaml then create it:
kubectl create -f Dockerfiles/manifests/cluster-agent/dca-secret.yaml
yields:
kubectl get secret datadog-auth-token
NAME TYPE DATA AGE
datadog-auth-token Opaque 1 16s
If you are running the Datadog Node Agent 6.4.2+, to deploy the Datadog Cluster Agent you need to:
- Deploy the datadog-cluster-agent_service.yaml
- Configure and create the dca-secret.yaml
- Configure (DD_API_KEY and other options) the the cluster-agent.yaml and deploy it
- Specify the required options to ensure the communication between the Datadog Cluster Agent and the Datadog Node Agent.
- Apply the new configuration to the Datadog Node Agent DaemonSet and redeploy the DaemonSet.
- Once the Datadog Cluster Agent and the Datadog Node Agents are running, you can run the
agent status
command to confirm the successful communication.
Refer to the Datadog Cluster Agent troubleshooting section for more information.
The available commands for the Datadog Cluster Agents are:
datadog-cluster-agent status
: Gives an overview of the components of the agent and their health.datadog-cluster-agent metamap [nodeName]
: Queries the local cache of the mapping between the pods living onnodeName
and the cluster level metadata it's associated with (endpoints ...). Not specifying thenodeName
will run the mapper on all the nodes of the cluster.datadog-cluster-agent flare [caseID]
: Similarly to the node agent, the cluster agent can aggregate the logs and the configurations used and forward an archive to the support team or be deflated and used locally.
In order to collect events, you need the following environment variables:
- name: DD_COLLECT_KUBERNETES_EVENTS
value: "true"
- name: DD_LEADER_ELECTION
value: "true"
Enabling the leader election will ensure that only one agent collects the events.
Ensure the Node Agents and the Datadog Cluster Agent can properly communicate. Create a service in front of the Datadog Cluster Agent. Ensure an auth_token is properly shared between the agents. Confirm the RBAC rules are properly set.
In the Node Agent, set the env var DD_CLUSTER_AGENT_ENABLED
to true.
The env var DD_KUBERNETES_METADATA_TAG_UPDATE_FREQ
can be set to specify how often the Node Agents hit the Datadog Cluster Agent.
You can disable the Kubernetes metadata tag collection with DD_KUBERNETES_COLLECT_METADATA_TAGS
.
The Datadog Cluster Agent implements the External Metrics Provider's interface (currently in beta). Therefore it can serve Custom Metrics to Kubernetes for Horizontal Pod Autoscalers. It is referred throughout the documentation as the Custom Metrics Server, per Kubernetes' terminology.
To enable the Custom Metrics Server:
- Set
DD_EXTERNAL_METRICS_PROVIDER_ENABLED
totrue
in the Deployment of the Datadog Cluster Agent. - Configure the
<DD_APP_KEY>
as well as the<DD_API_KEY>
in the Deployment of the Datadog Cluster Agent. - Create a service exposing the port 443 and register it as an APIService for External Metrics.
Refer to the dedicated guide to configure the Custom Metrics Server and get more details about this feature.
The following environment variables are supported:
DD_API_KEY
- required - your Datadog API key.DD_HOSTNAME
: hostname to use for the Datadog Cluster Agent.DD_CLUSTER_AGENT_CMD_PORT
: port for the Datadog Cluster Agent to serve, default is5005
.DD_USE_METADATA_MAPPER
: enables the cluster level metadata mapping, default istrue
.DD_COLLECT_KUBERNETES_EVENTS
- configures the agent to collect Kubernetes events. Default tofalse
. See the Event collection section for more details.DD_LEADER_ELECTION
: activates the leader election. You must setDD_COLLECT_KUBERNETES_EVENTS
totrue
to activate this feature. Default value isfalse
.DD_LEADER_LEASE_DURATION
: used only if the leader election is activated. See the details here. Value in seconds, 60 by default.DD_CLUSTER_AGENT_AUTH_TOKEN
: 32 characters long token that needs to be shared between the node agent and the Datadog Cluster Agent.DD_KUBE_RESOURCES_NAMESPACE
: configures the namespace where the Cluster Agent creates the configmaps required for the Leader Election, the Event Collection (optional) and the Horizontal Pod Autoscaling.DD_KUBERNETES_INFORMERS_RESYNC_PERIOD
: frequency in seconds to query the API Server to resync the local cache. The default is 5 minutes.DD_KUBERNETES_INFORMERS_RESTCLIENT_TIMEOUT
: timeout in seconds of the client communicating with the API Server. Default is 60 seconds.DD_METRICS_PORT
: change the port for exposing metrics from the Datadog Cluster Agent. The default is port 5000.DD_EXTERNAL_METRICS_PROVIDER_BATCH_WINDOW
: time waited in seconds to process a batch of metrics from multiple Autoscalers. Default to 10 seconds.DD_EXTERNAL_METRICS_PROVIDER_MAX_AGE
: maximum age in seconds of a datapoint before considering it invalid to be served. Default to 120 seconds.DD_EXTERNAL_METRICS_AGGREGATOR
: aggregator for the Datadog metrics. Applies to all Autoscalers processed. Chose among [sum/avg/max/min]DD_EXTERNAL_METRICS_PROVIDER_BUCKET_SIZE
: size of the window in seconds used to query metric from Datadog. Default to 300 seconds.DD_EXTERNAL_METRICS_PROVIDER_LOCAL_COPY_REFRESH_RATE
: rate to resync local cache of processed metrics with the global store. Useful when there are several replicas of the Cluster Agent.
The Datadog Cluster Agent is designed to be used in a containerized ecosystem.
Start by creating the binary by running inv -e cluster-agent.build
from the datadog-agent
package. This will add a binary in ./bin/datadog-cluster-agent/
Then from the current folder, run inv -e cluster-agent.image-build
.