This module will set up alerts to make sure you don't suddenly overspend on you datadog bill. It will also generate a costs dashboard
This module is part of a larger suite of modules that provide alerts in Datadog. Other modules can be found on the Terraform Registry
We have two base modules we use to standardise development of our Monitor Modules:
- generic monitor Used in 90% of our alerts
- service check monitor
Modules are generated with this tool: https://github.com/kabisa/datadog-terraform-generator
module "costs" {
source = "kabisa/costs/datadog"
notification_channel = "@mail@example.com"
env = "prd"
alert_env = "prd"
# Example config, please adjust
filter_str = "*"
apm_hosts_critical = 2
apm_spans_critical = 1000000
apm_spans_warning = 800000
containers_critical = 375
custom_metrics_critical = 10000
hosts_critical = 20
logs_indexed_critical = 150000
logs_ingestion_4h_critical = 208000000
logs_ingestion_critical = 850000000
logs_ingestion_warning = 600000000
}
Monitors:
Monitor name | Default enabled | Priority | Query |
---|---|---|---|
Apm Hosts | True | 4 | avg(last_1h):sum:datadog.estimated_usage.apm_hosts{tag:xxx} > |
Apm Spans | True | 3 | sum(last_1h):sum:datadog.estimated_usage.apm.indexed_spans{tag:xxx}.as_count() > |
Containers | True | 4 | avg(last_1h):sum:datadog.estimated_usage.containers{tag:xxx} > |
Custom Metrics | True | 4 | avg(last_1h):sum:datadog.estimated_usage.metrics.custom{tag:xxx} > |
Hosts | True | 4 | avg(last_1h):sum:datadog.estimated_usage.hosts{tag:xxx} > |
Logs Indexed | True | 3 | sum(last_4h):sum:custom_datadog.estimated_usage.logs.ingested_events{tag:xxx}.as_count() > |
Logs Ingestion 4h | True | 3 | sum(last_4h):sum:custom_datadog.estimated_usage.logs.ingested_bytes{tag:xxx}.as_count() > |
Logs Ingestion | True | 3 | sum(last_1d):sum:custom_datadog.estimated_usage.logs.ingested_bytes{tag:xxx}.as_count() > |
pre-commit was used to do Terraform linting and validating.
Steps:
- Install pre-commit. E.g.
brew install pre-commit
. - Run
pre-commit install
in this repo. (Every time you clone a repo with pre-commit enabled you will need to run the pre-commit install command) - That’s it! Now every time you commit a code change (
.tf
file), the hooks in thehooks:
config.pre-commit-config.yaml
will execute.
Query:
avg(last_1h):sum:datadog.estimated_usage.apm_hosts{tag:xxx} >
variable | default | required | description |
---|---|---|---|
apm_hosts_enabled | True | No | |
apm_hosts_warning | None | No | |
apm_hosts_critical | Yes | ||
apm_hosts_evaluation_period | last_1h | No | |
apm_hosts_note | "" | No | |
apm_hosts_docs | "" | No | |
apm_hosts_filter_override | "" | No | |
apm_hosts_alerting_enabled | True | No | |
apm_hosts_priority | 4 | No | Number from 1 (high) to 5 (low). |
Query:
sum(last_1h):sum:datadog.estimated_usage.apm.indexed_spans{tag:xxx}.as_count() >
variable | default | required | description |
---|---|---|---|
apm_spans_enabled | True | No | |
apm_spans_warning | Yes | ||
apm_spans_critical | Yes | ||
apm_spans_evaluation_period | last_1h | No | |
apm_spans_note | "" | No | |
apm_spans_docs | "" | No | |
apm_spans_filter_override | "" | No | |
apm_spans_alerting_enabled | True | No | |
apm_spans_priority | 3 | No | Number from 1 (high) to 5 (low). |
Query:
avg(last_1h):sum:datadog.estimated_usage.containers{tag:xxx} >
variable | default | required | description |
---|---|---|---|
containers_enabled | True | No | |
containers_warning | None | No | |
containers_critical | Yes | ||
containers_evaluation_period | last_1h | No | |
containers_note | "" | No | |
containers_docs | "" | No | |
containers_filter_override | "" | No | |
containers_alerting_enabled | True | No | |
containers_priority | 4 | No | Number from 1 (high) to 5 (low). |
Query:
avg(last_1h):sum:datadog.estimated_usage.metrics.custom{tag:xxx} >
variable | default | required | description |
---|---|---|---|
custom_metrics_enabled | True | No | |
custom_metrics_warning | None | No | |
custom_metrics_critical | Yes | ||
custom_metrics_evaluation_period | last_1h | No | |
custom_metrics_note | "" | No | |
custom_metrics_docs | "" | No | |
custom_metrics_filter_override | "" | No | |
custom_metrics_alerting_enabled | True | No | |
custom_metrics_priority | 4 | No | Number from 1 (high) to 5 (low). |
Query:
avg(last_1h):sum:datadog.estimated_usage.hosts{tag:xxx} >
variable | default | required | description |
---|---|---|---|
hosts_enabled | True | No | |
hosts_warning | None | No | |
hosts_critical | Yes | ||
hosts_evaluation_period | last_1h | No | |
hosts_note | "" | No | |
hosts_docs | "" | No | |
hosts_filter_override | "" | No | |
hosts_alerting_enabled | True | No | |
hosts_priority | 4 | No | Number from 1 (high) to 5 (low). |
Query:
sum(last_4h):sum:custom_datadog.estimated_usage.logs.ingested_events{tag:xxx}.as_count() >
variable | default | required | description |
---|---|---|---|
logs_indexed_enabled | True | No | |
logs_indexed_warning | None | No | |
logs_indexed_critical | Yes | ||
logs_indexed_evaluation_period | last_4h | No | |
logs_indexed_note | "" | No | |
logs_indexed_docs | "" | No | |
logs_indexed_filter_override | "" | No | |
logs_indexed_alerting_enabled | True | No | |
logs_indexed_priority | 3 | No | Number from 1 (high) to 5 (low). |
Query:
sum(last_4h):sum:custom_datadog.estimated_usage.logs.ingested_bytes{tag:xxx}.as_count() >
variable | default | required | description |
---|---|---|---|
logs_ingestion_4h_enabled | True | No | |
logs_ingestion_4h_warning | None | No | |
logs_ingestion_4h_critical | None | No | |
logs_ingestion_4h_evaluation_period | last_4h | No | |
logs_ingestion_4h_note | "" | No | |
logs_ingestion_4h_docs | "" | No | |
logs_ingestion_4h_filter_override | "" | No | |
logs_ingestion_4h_alerting_enabled | True | No | |
logs_ingestion_4h_priority | 3 | No | Number from 1 (high) to 5 (low). |
Query:
sum(last_1d):sum:custom_datadog.estimated_usage.logs.ingested_bytes{tag:xxx}.as_count() >
variable | default | required | description |
---|---|---|---|
logs_ingestion_enabled | True | No | |
logs_ingestion_warning | Yes | ||
logs_ingestion_critical | Yes | ||
logs_ingestion_evaluation_period | last_1d | No | |
logs_ingestion_note | "" | No | |
logs_ingestion_docs | "" | No | |
logs_ingestion_filter_override | "" | No | |
logs_ingestion_alerting_enabled | True | No | |
logs_ingestion_priority | 3 | No | Number from 1 (high) to 5 (low). |
variable | default | required | description |
---|---|---|---|
env | Yes | ||
service | Costs | No | |
notification_channel | Yes | ||
additional_tags | [] | No | |
filter_str | Yes | ||
costs_dashboard_name_override | "" | No | |
locked | True | No |