Practical and Simple datadog resource generation with some sane defaults for dashboards and monitoring of Kubernetes infrastructure with Datadog.
>= python3.7
or- Docker
The top of the configuration yaml contains environment variables that are used throughout the monitors. It is important to note that cluster
and cluster_tag
in this configuration yaml must match the value of the DD_CLUSTER_CHECKS_EXTRA_TAGS
variable set in the datadog
hel in course.yml
. We have also added a namespace definition that will be used by all monitors unless you specifically overwrite those values (documented below).
The configuration yaml file above should be populated with all the resource we want to use. Currently supported are monitors
, timeboards
, downtimes
.
Finally, when ready run:
terradog create -f <source_file> -o <output_directory>
which creates a uniquw terraform file in the output directory for each monitor
, timeboard
, or downtime
according to the configuration of definitions and namespaces.
There are several monitors included in this module and those are broken up into families such as kubernetes-stable
, kubernetes-optional
, rds
, etc.(see below for a comprehensive list). If you wish to deploy all monitors in a family, you can simply call out the family name in the yaml file. The following snippet will create a terraform file for each of the monitors in the kubernetes-stable
family. This should be done in the monitors
section. You can see the current monitors (yaml files) under monitors. See Custom Monitors to create new monitors.
monitors:
- source: kubernetes-stable
You can override individual values in the monitor as necessary. A complete list of fields is declared in Custom Monitors.
This example will use all the defaults from the kubernetes-stable.pod_crashes
monitor but override the thresholds
field.
monitors:
- source: kubernetes-stable.pod_crashes
pod_crashes_critical_threshold: 5
If the monitor you want doesn't exist as a native monitor in a monitor family you can define any monitor you want inline in the yaml file. This will create
definitions:
environment: "production"
cluster: "production.cluster"
notifications: "@pagerduty"
cluster_tag: "kubernetescluster"
monitors:
- notify_audit: false
locked: false
name: '[${environment}] Increase in network errors'
tags: [network, '${environment}', fairwinds]
include_tags: false
no_data_timeframe: null
silenced: {}
new_host_delay: 300
require_full_window: true
notify_no_data: false
renotify_interval: 0
escalation_message: ''
query: avg(last_15m):avg:kubernetes.network.rx_errors{kubernetescluster:${cluster}} + avg:kubernetes.network.tx_errors{kubernetescluster:${cluster}} > 10
message: |
{{#is_alert}}
We are getting increasing network errors
{{/is_alert}}
${notifications}
type: metric alert
thresholds: {critical: 10, warning: 5}
timeout_h: 0
Alluded to above, in each monitor and dashboard there are certain fields that need to be defined; environment
, cluster
are two examples.
They are denoted by ${<name>}
in the template and are referred to as "definitions". You can define definitions at a global level.
definitions:
cluster: working.cluster
environment: production
cluster_tag: kubernetescluster
monitors:
- source: kubernetes-stable
or at an individual monitor level to override the global value.
definitions:
cluster: working.cluster
environment: production
cluster_tag: kubernetescluster
monitors:
- source: kubernetes-stable
definitions:
environment: staging # this will override the global value
Datadog does not support multiple namespaces in monitor query filters. As a workaround, terradog
monitors that include a namespace filter can generate multiple namespace specific monitors from a single monitor definition.
The underlying terradog
monitor must have vary_by_namespace: true
set. The monitor's namespaces definition must ba list. Each namespace definition value will produce a separate monitor and Terraform file.
Namespaces can be defined in 3 places:
- There is a
definition_defaults
section create at the bottom of each monitor's yaml file. This is used if you do not specify a namespace at the top of your config file or within thesource
.
definition_defaults:
daemonset_readiness_critical_threshold: 0
namespaces:
- kube-system
- As mentioned in the Native Monitors and Monitor Families section above, you can specify some settings within the
monitors
definition in the config file. This includes namespaces.
monitors:
- source: kubernetes-optional.daemonset_readiness
definitions:
namespaces:
- cert-manager
- Lastly you can also add overall definitions to the top of config file`, including namespaces.
namespaces:
- cert-manager
- datadog
- external-dns
- kube-system
- cluster-autoscaler
- Namespaces defined at the top of config file or within the
monitor
override whatever is in thedefinition_defaults
in the ${monitor}.yaml. - Namespaces defined within the
monitor
section of an individual monitor (kubernetes-optional.daemonset_readiness
for example) are merged with the top-level definitions
Sometimes you'll want to apply all the monitors or dashboards in a family of except one or two. In this case, you can can use a list of exclude
in family call. All the monitors or dashboards in that family will be templated except the ones listed in exclude
. Just the path of the resource is required:
monitors:
- source: kubernetes-stable
exclude:
- cluster_iops
Complete list of fields as taken from datadog tf provider
- read_only
- description
- title
- graphs
- template_variables
Complete list of fields are taken from datadog tf provider
- name - descriptive name for the downtime (not part of the downtime definition)
- scope
- active
- start
- end
- start_date
- end_date
- recurrence_type
- recurrence_period
- recurrence_week_days
- recurrence_until
- message
- monitor_id
- monitor_tags