Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitor Query Descriptions #278

Merged
merged 1 commit into from
May 4, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions .apigentools-info
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@
"spec_versions": {
"v1": {
"apigentools_version": "1.0.0b3",
"regenerated": "2020-05-01 16:38:18.985457",
"spec_repo_commit": "afd3d4d"
"regenerated": "2020-05-04 11:12:33.051540",
"spec_repo_commit": "14ea455"
},
"v2": {
"apigentools_version": "1.0.0b3",
"regenerated": "2020-05-01 16:38:24.307894",
"spec_repo_commit": "afd3d4d"
"regenerated": "2020-05-04 11:12:38.552093",
"spec_repo_commit": "14ea455"
}
}
}
2 changes: 2 additions & 0 deletions api_docs/v1/MonitorType.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@

* `QUERY_ALERT` (value: `"query alert"`)

* `RUM_ALERT` (value: `"rum alert"`)

* `SERVICE_CHECK` (value: `"service check"`)

* `SYNTHETICS_ALERT` (value: `"synthetics alert"`)
Expand Down
103 changes: 103 additions & 0 deletions api_docs/v1/MonitorsApi.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,109 @@ Create a monitor

Create a monitor using the specified options.

#### Monitor Types

The type of monitor chosen from:
- anomaly: `query alert`
- apm: `query alert`
- composite: `composite`
- custom: `service check`
- event: `event alert`
- forecast: `query alert`
- host: `service check`
- integration: `query alert` or `service check`
- live process: `process alert`
- logs: `logs alert`
- metric: `metric alert`
- network: `service check`
- outlier: `query alert`
- process: `service query`
- rum: `alert`
- watchdog: `event alert`

#### Query Types

**Metric Alert Query**

Example: `time_aggr(time_window):space_aggr:metric{tags} [by {key}] operator #`

- `time_aggr`: avg, sum, max, min, change, or pct_change
- `time_window`: `last_#m` (with `#` being 5, 10, 15, or 30) or `last_#h`(with `#` being 1, 2, or 4), or `last_1d`
- `space_aggr`: avg, sum, min, or max
- `tags`: one or more tags (comma-separated), or *
- `key`: a 'key' in key:value tag syntax; defines a separate alert for each tag in the group (multi-alert)
- `operator`: <, <=, >, >=, ==, or !=
- `#`: an integer or decimal number used to set the threshold

If you are using the `_change_` or `_pct_change_` time aggregator, instead use `change_aggr(time_aggr(time_window),
timeshift):space_aggr:metric{tags} [by {key}] operator #` with:

- `change_aggr` change, pct_change
- `time_aggr` avg, sum, max, min [Learn more](https://docs.datadoghq.com/monitors/monitor_types/#define-the-conditions)
- `time_window` last\_#m (1, 5, 10, 15, or 30), last\_#h (1, 2, or 4), or last_#d (1 or 2)
- `timeshift` #m_ago (5, 10, 15, or 30), #h_ago (1, 2, or 4), or 1d_ago

Use this to create an outlier monitor using the following query:
`avg(last_30m):outliers(avg:system.cpu.user{role:es-events-data} by {host}, 'dbscan', 7) > 0`

**Service Check Query**

Example: `"check".over(tags).last(count).count_by_status()`

- **`check`** name of the check, e.g. datadog.agent.up
- **`tags`** one or more quoted tags (comma-separated), or "*". e.g.: `.over("env:prod", "role:db")`
- **`count`** must be at >= your max threshold (defined in the `options`).
e.g. if you want to notify on 1 critical, 3 ok and 2 warn statuses count should be 3. It is limited to 100.

**Event Alert Query**

Example: `events('sources:nagios status:error,warning priority:normal tags: "string query"').rollup("count").last("1h")"`

- **`event`**, the event query string:
- **`string_query`** free text query to match against event title and text.
- **`sources`** event sources (comma-separated).
- **`status`** event statuses (comma-separated). Valid options: error, warn, and info.
- **`priority`** event priorities (comma-separated). Valid options: low, normal, all.
- **`host`** event reporting host (comma-separated).
- **`tags`** event tags (comma-separated).
- **`excluded_tags`** exluded event tags (comma-separated).
- **`rollup`** the stats rollup method. `count` is the only supported method now.
- **`last`** the timeframe to roll up the counts. Examples: 60s, 4h. Supported timeframes: s, m, h and d.

**Process Alert Query**

Example: `processes(search).over(tags).rollup('count').last(timeframe) operator #`

- **`search`** free text search string for querying processes.
Matching processes match results on the [Live Processes](https://docs.datadoghq.com/infrastructure/process/?tab=linuxwindows) page.
- **`tags`** one or more tags (comma-separated)
- **`timeframe`** the timeframe to roll up the counts. Examples: 60s, 4h. Supported timeframes: s, m, h and d
- **`operator`** <, <=, >, >=, ==, or !=
- **`#`** an integer or decimal number used to set the threshold

**Logs Alert Query**

Example: `logs(query).index(index_name).rollup(rollup_method[, measure]).last(time_window) operator #`

- **`query`** The search query - following the [Log search syntax](https://docs.datadoghq.com/logs/search_syntax/).
- **`index_name`** For multi-index organizations, the log index in which the request is performed.
- **`rollup_method`** The stats rollup method - supports `count`, `avg` and `cardinality`.
- **`measure`** For `avg` and cardinality `rollup_method` - specify the measure or the facet name you want to use.
- **`time_window`** #m (5, 10, 15, or 30), #h (1, 2, or 4, 24)
- **`operator`** `<`, `<=`, `>`, `>=`, `==`, or `!=`.
- **`#`** an integer or decimal number used to set the threshold.

**Composite Query**

Example: `12345 && 67890`, where `12345` and `67890` are the IDs of non-composite monitors

* **`name`** [*required*, *default* = **dynamic, based on query**]: The name of the alert.
* **`message`** [*required*, *default* = **dynamic, based on query**]: A message to include with notifications for this monitor.
Email notifications can be sent to specific users by using the same '@username' notation as events.
* **`tags`** [*optional*, *default* = **empty list**]: A list of tags to associate with your monitor.
When getting all monitor details via the API, use the `monitor_tags` argument to filter results by these tags.
It is only available via the API and isn't visible or editable in the Datadog UI.

### Example

```java
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -246,7 +246,7 @@ public ApiResponse<Monitor> executeWithHttpInfo() throws ApiException {

/**
* Create a monitor
* Create a monitor using the specified options.
* Create a monitor using the specified options. #### Monitor Types The type of monitor chosen from: - anomaly: &#x60;query alert&#x60; - apm: &#x60;query alert&#x60; - composite: &#x60;composite&#x60; - custom: &#x60;service check&#x60; - event: &#x60;event alert&#x60; - forecast: &#x60;query alert&#x60; - host: &#x60;service check&#x60; - integration: &#x60;query alert&#x60; or &#x60;service check&#x60; - live process: &#x60;process alert&#x60; - logs: &#x60;logs alert&#x60; - metric: &#x60;metric alert&#x60; - network: &#x60;service check&#x60; - outlier: &#x60;query alert&#x60; - process: &#x60;service query&#x60; - rum: &#x60;alert&#x60; - watchdog: &#x60;event alert&#x60; #### Query Types **Metric Alert Query** Example: &#x60;time_aggr(time_window):space_aggr:metric{tags} [by {key}] operator #&#x60; - &#x60;time_aggr&#x60;: avg, sum, max, min, change, or pct_change - &#x60;time_window&#x60;: &#x60;last_#m&#x60; (with &#x60;#&#x60; being 5, 10, 15, or 30) or &#x60;last_#h&#x60;(with &#x60;#&#x60; being 1, 2, or 4), or &#x60;last_1d&#x60; - &#x60;space_aggr&#x60;: avg, sum, min, or max - &#x60;tags&#x60;: one or more tags (comma-separated), or * - &#x60;key&#x60;: a &#39;key&#39; in key:value tag syntax; defines a separate alert for each tag in the group (multi-alert) - &#x60;operator&#x60;: &lt;, &lt;&#x3D;, &gt;, &gt;&#x3D;, &#x3D;&#x3D;, or !&#x3D; - &#x60;#&#x60;: an integer or decimal number used to set the threshold If you are using the &#x60;_change_&#x60; or &#x60;_pct_change_&#x60; time aggregator, instead use &#x60;change_aggr(time_aggr(time_window), timeshift):space_aggr:metric{tags} [by {key}] operator #&#x60; with: - &#x60;change_aggr&#x60; change, pct_change - &#x60;time_aggr&#x60; avg, sum, max, min [Learn more](https://docs.datadoghq.com/monitors/monitor_types/#define-the-conditions) - &#x60;time_window&#x60; last\\_#m (1, 5, 10, 15, or 30), last\\_#h (1, 2, or 4), or last_#d (1 or 2) - &#x60;timeshift&#x60; #m_ago (5, 10, 15, or 30), #h_ago (1, 2, or 4), or 1d_ago Use this to create an outlier monitor using the following query: &#x60;avg(last_30m):outliers(avg:system.cpu.user{role:es-events-data} by {host}, &#39;dbscan&#39;, 7) &gt; 0&#x60; **Service Check Query** Example: &#x60;\&quot;check\&quot;.over(tags).last(count).count_by_status()&#x60; - **&#x60;check&#x60;** name of the check, e.g. datadog.agent.up - **&#x60;tags&#x60;** one or more quoted tags (comma-separated), or \&quot;*\&quot;. e.g.: &#x60;.over(\&quot;env:prod\&quot;, \&quot;role:db\&quot;)&#x60; - **&#x60;count&#x60;** must be at &gt;&#x3D; your max threshold (defined in the &#x60;options&#x60;). e.g. if you want to notify on 1 critical, 3 ok and 2 warn statuses count should be 3. It is limited to 100. **Event Alert Query** Example: &#x60;events(&#39;sources:nagios status:error,warning priority:normal tags: \&quot;string query\&quot;&#39;).rollup(\&quot;count\&quot;).last(\&quot;1h\&quot;)\&quot;&#x60; - **&#x60;event&#x60;**, the event query string: - **&#x60;string_query&#x60;** free text query to match against event title and text. - **&#x60;sources&#x60;** event sources (comma-separated). - **&#x60;status&#x60;** event statuses (comma-separated). Valid options: error, warn, and info. - **&#x60;priority&#x60;** event priorities (comma-separated). Valid options: low, normal, all. - **&#x60;host&#x60;** event reporting host (comma-separated). - **&#x60;tags&#x60;** event tags (comma-separated). - **&#x60;excluded_tags&#x60;** exluded event tags (comma-separated). - **&#x60;rollup&#x60;** the stats rollup method. &#x60;count&#x60; is the only supported method now. - **&#x60;last&#x60;** the timeframe to roll up the counts. Examples: 60s, 4h. Supported timeframes: s, m, h and d. **Process Alert Query** Example: &#x60;processes(search).over(tags).rollup(&#39;count&#39;).last(timeframe) operator #&#x60; - **&#x60;search&#x60;** free text search string for querying processes. Matching processes match results on the [Live Processes](https://docs.datadoghq.com/infrastructure/process/?tab&#x3D;linuxwindows) page. - **&#x60;tags&#x60;** one or more tags (comma-separated) - **&#x60;timeframe&#x60;** the timeframe to roll up the counts. Examples: 60s, 4h. Supported timeframes: s, m, h and d - **&#x60;operator&#x60;** &lt;, &lt;&#x3D;, &gt;, &gt;&#x3D;, &#x3D;&#x3D;, or !&#x3D; - **&#x60;#&#x60;** an integer or decimal number used to set the threshold **Logs Alert Query** Example: &#x60;logs(query).index(index_name).rollup(rollup_method[, measure]).last(time_window) operator #&#x60; - **&#x60;query&#x60;** The search query - following the [Log search syntax](https://docs.datadoghq.com/logs/search_syntax/). - **&#x60;index_name&#x60;** For multi-index organizations, the log index in which the request is performed. - **&#x60;rollup_method&#x60;** The stats rollup method - supports &#x60;count&#x60;, &#x60;avg&#x60; and &#x60;cardinality&#x60;. - **&#x60;measure&#x60;** For &#x60;avg&#x60; and cardinality &#x60;rollup_method&#x60; - specify the measure or the facet name you want to use. - **&#x60;time_window&#x60;** #m (5, 10, 15, or 30), #h (1, 2, or 4, 24) - **&#x60;operator&#x60;** &#x60;&lt;&#x60;, &#x60;&lt;&#x3D;&#x60;, &#x60;&gt;&#x60;, &#x60;&gt;&#x3D;&#x60;, &#x60;&#x3D;&#x3D;&#x60;, or &#x60;!&#x3D;&#x60;. - **&#x60;#&#x60;** an integer or decimal number used to set the threshold. **Composite Query** Example: &#x60;12345 &amp;&amp; 67890&#x60;, where &#x60;12345&#x60; and &#x60;67890&#x60; are the IDs of non-composite monitors * **&#x60;name&#x60;** [*required*, *default* &#x3D; **dynamic, based on query**]: The name of the alert. * **&#x60;message&#x60;** [*required*, *default* &#x3D; **dynamic, based on query**]: A message to include with notifications for this monitor. Email notifications can be sent to specific users by using the same &#39;@username&#39; notation as events. * **&#x60;tags&#x60;** [*optional*, *default* &#x3D; **empty list**]: A list of tags to associate with your monitor. When getting all monitor details via the API, use the &#x60;monitor_tags&#x60; argument to filter results by these tags. It is only available via the API and isn&#39;t visible or editable in the Datadog UI.
* @return createMonitorRequest
* @throws ApiException if fails to make API call

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ public enum MonitorType {

QUERY_ALERT("query alert"),

RUM_ALERT("rum alert"),

SERVICE_CHECK("service check"),

SYNTHETICS_ALERT("synthetics alert"),
Expand Down
213 changes: 212 additions & 1 deletion src/main/java/com/datadog/api/v1/openapi.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3628,6 +3628,7 @@ components:
- metric alert
- process alert
- query alert
- rum alert
- service check
- synthetics alert
- trace-analytics alert
Expand All @@ -3639,6 +3640,7 @@ components:
- METRIC_ALERT
- PROCESS_ALERT
- QUERY_ALERT
- RUM_ALERT
- SERVICE_CHECK
- SYNTHETICS_ALERT
- TRACE_ANALYTICS_ALERT
Expand Down Expand Up @@ -10453,7 +10455,216 @@ paths:
tags:
- Monitors
post:
description: Create a monitor using the specified options.
description: 'Create a monitor using the specified options.


#### Monitor Types


The type of monitor chosen from:

- anomaly: `query alert`

- apm: `query alert`

- composite: `composite`

- custom: `service check`

- event: `event alert`

- forecast: `query alert`

- host: `service check`

- integration: `query alert` or `service check`

- live process: `process alert`

- logs: `logs alert`

- metric: `metric alert`

- network: `service check`

- outlier: `query alert`

- process: `service query`

- rum: `alert`

- watchdog: `event alert`


#### Query Types


**Metric Alert Query**


Example: `time_aggr(time_window):space_aggr:metric{tags} [by {key}] operator
#`


- `time_aggr`: avg, sum, max, min, change, or pct_change

- `time_window`: `last_#m` (with `#` being 5, 10, 15, or 30) or `last_#h`(with
`#` being 1, 2, or 4), or `last_1d`

- `space_aggr`: avg, sum, min, or max

- `tags`: one or more tags (comma-separated), or *

- `key`: a ''key'' in key:value tag syntax; defines a separate alert for each
tag in the group (multi-alert)

- `operator`: <, <=, >, >=, ==, or !=

- `#`: an integer or decimal number used to set the threshold


If you are using the `_change_` or `_pct_change_` time aggregator, instead
use `change_aggr(time_aggr(time_window),

timeshift):space_aggr:metric{tags} [by {key}] operator #` with:


- `change_aggr` change, pct_change

- `time_aggr` avg, sum, max, min [Learn more](https://docs.datadoghq.com/monitors/monitor_types/#define-the-conditions)

- `time_window` last\_#m (1, 5, 10, 15, or 30), last\_#h (1, 2, or 4), or
last_#d (1 or 2)

- `timeshift` #m_ago (5, 10, 15, or 30), #h_ago (1, 2, or 4), or 1d_ago


Use this to create an outlier monitor using the following query:

`avg(last_30m):outliers(avg:system.cpu.user{role:es-events-data} by {host},
''dbscan'', 7) > 0`


**Service Check Query**


Example: `"check".over(tags).last(count).count_by_status()`


- **`check`** name of the check, e.g. datadog.agent.up

- **`tags`** one or more quoted tags (comma-separated), or "*". e.g.: `.over("env:prod",
"role:db")`

- **`count`** must be at >= your max threshold (defined in the `options`).

e.g. if you want to notify on 1 critical, 3 ok and 2 warn statuses count should
be 3. It is limited to 100.


**Event Alert Query**


Example: `events(''sources:nagios status:error,warning priority:normal tags:
"string query"'').rollup("count").last("1h")"`


- **`event`**, the event query string:

- **`string_query`** free text query to match against event title and text.

- **`sources`** event sources (comma-separated).

- **`status`** event statuses (comma-separated). Valid options: error, warn,
and info.

- **`priority`** event priorities (comma-separated). Valid options: low, normal,
all.

- **`host`** event reporting host (comma-separated).

- **`tags`** event tags (comma-separated).

- **`excluded_tags`** exluded event tags (comma-separated).

- **`rollup`** the stats rollup method. `count` is the only supported method
now.

- **`last`** the timeframe to roll up the counts. Examples: 60s, 4h. Supported
timeframes: s, m, h and d.


**Process Alert Query**


Example: `processes(search).over(tags).rollup(''count'').last(timeframe) operator
#`


- **`search`** free text search string for querying processes.

Matching processes match results on the [Live Processes](https://docs.datadoghq.com/infrastructure/process/?tab=linuxwindows)
page.

- **`tags`** one or more tags (comma-separated)

- **`timeframe`** the timeframe to roll up the counts. Examples: 60s, 4h.
Supported timeframes: s, m, h and d

- **`operator`** <, <=, >, >=, ==, or !=

- **`#`** an integer or decimal number used to set the threshold


**Logs Alert Query**


Example: `logs(query).index(index_name).rollup(rollup_method[, measure]).last(time_window)
operator #`


- **`query`** The search query - following the [Log search syntax](https://docs.datadoghq.com/logs/search_syntax/).

- **`index_name`** For multi-index organizations, the log index in which the
request is performed.

- **`rollup_method`** The stats rollup method - supports `count`, `avg` and
`cardinality`.

- **`measure`** For `avg` and cardinality `rollup_method` - specify the measure
or the facet name you want to use.

- **`time_window`** #m (5, 10, 15, or 30), #h (1, 2, or 4, 24)

- **`operator`** `<`, `<=`, `>`, `>=`, `==`, or `!=`.

- **`#`** an integer or decimal number used to set the threshold.


**Composite Query**


Example: `12345 && 67890`, where `12345` and `67890` are the IDs of non-composite
monitors


* **`name`** [*required*, *default* = **dynamic, based on query**]: The name
of the alert.

* **`message`** [*required*, *default* = **dynamic, based on query**]: A message
to include with notifications for this monitor.

Email notifications can be sent to specific users by using the same ''@username''
notation as events.

* **`tags`** [*optional*, *default* = **empty list**]: A list of tags to associate
with your monitor.

When getting all monitor details via the API, use the `monitor_tags` argument
to filter results by these tags.

It is only available via the API and isn''t visible or editable in the Datadog
UI.'
operationId: CreateMonitor
requestBody:
content:
Expand Down