Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 54 additions & 19 deletions content/en/integrations/guide/aws-integration-troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,49 +20,76 @@

**Note**: This error may persist in the Datadog UI for a few hours while the changes propagate.

### Resolve All AWS permissions issues
**Resolve All AWS Permissions Issues** in the AWS Integration tile allows you to use a CloudFormation QuickStart stack to update your IAM role and resolve missing permissions issues.
Under **Issues**, click **Resolve All AWS Permissions Issues**. This launches a CloudFormation stack that calls our public API [endpoint][13] and fetches the latest IAM permissions needed for the integration, creates new IAM policies containing those permissions, and attaches these to the integration role. It will also attach the `SecurityAudit` Managed AWS policy if it is not present.

**Notes**:

* **Resolve All AWS Permissions Issues** does not fix broken authentication issues with the role (where the role name, external ID, or trust policy configuration does not let Datadog authenticate with your AWS account).
* The CloudFormation template is in our public [repository][14].
* The policies created are named with a `datadog-aws-integration-iam-permissions-` prefix followed by a unique hash to avoid colliding with any existing policies you have configured.
* If **Resolve All AWS Permissions Issues** is clicked multiple times, any old policies created with that prefix are deleted before the new ones are created.
* Any policies you attached to the role will NOT be impacted.
* **Resolve All AWS Permissions Issues** will not fix cases where a Service Control Policy (SCP) is applied that explicitly denies the required permissions.

### Resolve all AWS permissions issues

The **Resolve all AWS permissions issues** button in the AWS Integration page allows you to use a CloudFormation QuickStart stack to update your Datadog integration IAM role and resolve missing permissions issues.

Check notice on line 25 in content/en/integrations/guide/aws-integration-troubleshooting.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.sentencelength

Suggestion: Try to keep your sentence length to 25 words or fewer.
Under **Issues**, click **Resolve All AWS Permissions Issues**. This launches a CloudFormation stack that calls Datadog's public API [endpoint][13] and fetches the latest IAM permissions needed for the integration, creates new IAM policies containing those permissions, and attaches these to the integration role. It also attaches the `SecurityAudit` Managed AWS policy if it is not present.

Check notice on line 26 in content/en/integrations/guide/aws-integration-troubleshooting.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.sentencelength

Suggestion: Try to keep your sentence length to 25 words or fewer.

The policies created are named with a `datadog-aws-integration-iam-permissions-` prefix, followed by a unique hash to avoid colliding with any existing policies you have configured. You can view the CloudFormation template in Datadog's public [cloudformation-template repository][14].

<div class="alert alert-danger">
Clicking <strong>Resolve All AWS Permissions Issues</strong>:<br>

Check notice on line 31 in content/en/integrations/guide/aws-integration-troubleshooting.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.sentencelength

Suggestion: Try to keep your sentence length to 25 words or fewer.
- Does not fix broken authentication issues with the role (where the role name, external ID, or trust policy configuration does not let Datadog authenticate with your AWS account)<br>
- Does not fix cases where a Service Control Policy (SCP) is applied that explicitly denies the required permissions<br>
- Does not impact any policies that you attach to the Datadog integration IAM role<br>
- If clicked multiple times, any previous policies created with that prefix are deleted before new policies are created
</div>

## Data discrepancies

### Discrepancy between your data in CloudWatch and Datadog

There are two important distinctions to be aware of:
The sections below describe two important distinctions to be aware of, as well as steps to [reconcile the discrepancy](#reconcile-the-discrepancy).

#### 1. Time aggregation

Datadog displays raw data from AWS in per-second values, regardless of the time frame selected in AWS. This is why Datadog's value could appear lower. See [Time aggregation][20] in the metric documentation for more information.

#### 2. Space aggregation

The space aggregators `min`, `max`, and `avg` have a different meaning between AWS and Datadog. In AWS, average latency, minimum latency, and maximum latency are three distinct metrics that AWS collects. When Datadog polls metrics from AWS CloudWatch, the average latency is received as a single timeseries per Elastic Load Balancer (ELB).

Within Datadog, when you select `min`, `max`, or `avg`, you are controlling how multiple timeseries are combined. For example, requesting `system.cpu.idle` without any filter returns one series for each host that reports that metric, and those series need to be combined to be graphed. If instead you request `system.cpu.idle` from a single host, no aggregation is necessary and switching between average and max yields the same result.

Check notice on line 52 in content/en/integrations/guide/aws-integration-troubleshooting.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.sentencelength

Suggestion: Try to keep your sentence length to 25 words or fewer.

See [Space aggregation][22] in the metric documentation for more information.

#### Reconcile the discrepancy

1. Go to **CloudWatch → All metrics**.
2. Search for and graph the metric.
3. Select the **Source** tab to show the full metric query.

1. Datadog displays raw data from AWS in per-second values, regardless of the time frame selected in AWS. This is why Datadog's value could appear lower.
{{< img src="integrations/guide/aws_integration_troubleshooting/cloudwatch-metric-explorer.png" alt="The all metrics page in CloudWatch displaying a metric query under the source tab, with the cursor hovering over a point on the graph to display a metric value and timestamp" responsive="true" style="width:90%;" >}}

2. `min`, `max`, and `avg` have a different meaning within AWS than in Datadog. In AWS, average latency, minimum latency, and maximum latency are three distinct metrics that AWS collects. When Datadog pulls metrics from AWS CloudWatch, the average latency is received as a single timeseries per Elastic Load Balancer (ELB). Within Datadog, when you are selecting `min`, `max`, or `avg`, you are controlling how multiple timeseries are combined. For example, requesting `system.cpu.idle` without any filter returns one series for each host that reports that metric, and those series need to be combined to be graphed. If instead you requested `system.cpu.idle` from a single host, no aggregation is necessary and switching between average and max yields the same result.
4. Confirm that the query in CloudWatch is scoped identically to the query in Datadog:
- Any [Dimensions][17] used in the CloudWatch metric query should match tags used in the Datadog metric query
- The [Statistic][18] used in the query should match the Datadog [space aggregator][19]
- Region
- Metric Namespace and Metric name
5. Match the time frame in the [Datadog Metric Explorer][15] with the **Period** selected in the [CloudWatch Metric Explorer][16].
6. Hover over a datapoint on the graph to display the timestamp and value.
7. Find the same point in time in the Datadog graph and compare the values. If the values are equal, the original discrepancy was due to differences in either time or space aggregation between the two graphs.

## Metrics

### Metrics delayed

When using the AWS integration, Datadog pulls in your metrics through the CloudWatch API. You may see a slight delay in metrics from AWS due to some constraints that exist for their API.

The CloudWatch API only offers a metric-by-metric crawl to pull data. CloudWatch APIs have a rate limit that varies based on the combination of authentication credentials, region, and service. Metrics are made available by AWS dependent on the account level. For example, if you are paying for "detailed metrics" within AWS, they are available more quickly. This level of service for detailed metrics also applies to granularity, with some metrics being available per minute and others per five minutes.
The CloudWatch API only offers a metric-by-metric crawl to pull data. CloudWatch APIs have a rate limit that varies based on the combination of authentication credentials, region, and service. Metrics are made available by AWS dependent on the account level. For example, if you are paying for "detailed metrics" within AWS, they are available more frequently. This level of service for detailed metrics also applies to granularity, with some metrics being available per minute and others per five minutes.

Install the Datadog Agent on the host to avoid metric delay. See the [Datadog Agent documentation][3] to get started. Datadog has the ability to prioritize certain metrics within an account to pull them in faster, depending on the circumstances. Contact [Datadog support][4] for additional information.

### Missing metrics

CloudWatch's API returns only metrics with data points, so if for example an ELB has no attached instances, it is expected not to see metrics related to this ELB in Datadog.
CloudWatch's API returns only metrics with datapoints, so if for example an ELB has no attached instances, it is expected not to see metrics related to this ELB in Datadog.

Check notice on line 85 in content/en/integrations/guide/aws-integration-troubleshooting.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.sentencelength

Suggestion: Try to keep your sentence length to 25 words or fewer.

### Wrong count of aws.elb.healthy_host_count

When the cross-zone load balancing option is enabled on an ELB, all the instances attached to this ELB are considered part of all availability zones (on CloudWatch's side). For example, if you have two instances in `1a` and three instances in `ab`, the metric displays five instances per availability zone.
As this can be counter intuitive, the metrics **aws.elb.healthy_host_count_deduped** and **aws.elb.un_healthy_host_count_deduped** display the count of healthy and unhealthy instances per availability zone, regardless of if this cross-zone load balancing option is enabled or not.

## Datadog app
## Datadog UI

### Duplicated hosts when installing the Agent

Expand Down Expand Up @@ -123,3 +150,11 @@
[12]: https://github.com/DataDog/Miscellany/blob/master/remove_lingering_aws_host_tags.py
[13]: https://api.datadoghq.com/api/v2/integration/aws/iam_permissions
[14]: https://github.com/DataDog/cloudformation-template/tree/master/aws_attach_integration_permissions
[15]: https://app.datadoghq.com/metric/explorer
[16]: https://console.aws.amazon.com/cloudwatch/#explorer
[17]: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch_concepts.html#Dimension
[18]: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Statistics-definitions.html
[19]: /metrics/#configure-space-aggregation
[20]: /metrics/#time-aggregation
[21]: /metrics/guide/why-does-zooming-out-a-timeframe-also-smooth-out-my-graphs/
[22]: /metrics/#space-aggregation
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading