Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for Metric Rollup Addition of [RemoteService, RemoteTarget, ...] #729

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@

package com.amazon.aoc.services;

import static com.google.common.collect.ImmutableList.toImmutableList;

import com.amazonaws.services.cloudwatch.AmazonCloudWatch;
import com.amazonaws.services.cloudwatch.AmazonCloudWatchClientBuilder;
import com.amazonaws.services.cloudwatch.model.*;
Expand All @@ -28,13 +30,15 @@
import com.amazonaws.services.logs.model.OutputLogEvent;
import java.util.Date;
import java.util.List;
import kotlin.Pair;
import lombok.extern.log4j.Log4j2;

/** a wrapper of cloudwatch client. */
@Log4j2
public class CloudWatchService {
public static final String SERVICE_DIMENSION = "Service";
public static final String REMOTE_SERVICE_DIMENSION = "RemoteService";
public static final String REMOTE_TARGET_DIMENSION = "RemoteTarget";

private static final int MAX_QUERY_PERIOD = 60;
private static final String REQUESTER = "integrationTest";
Expand Down Expand Up @@ -62,15 +66,20 @@ public CloudWatchService(String region) {
public List<Metric> listMetrics(
final String namespace,
final String metricName,
final String dimensionKey,
final String dimensionValue) {
final DimensionFilter dimensionFilter =
new DimensionFilter().withName(dimensionKey).withValue(dimensionValue);
final List<Pair<String, String>> dimensionList) {
final List<DimensionFilter> dimensionFilters =
dimensionList.stream()
.map(
dimension ->
new DimensionFilter()
.withName(dimension.getFirst())
.withValue(dimension.getSecond()))
.collect(toImmutableList());
final ListMetricsRequest listMetricsRequest =
new ListMetricsRequest()
.withNamespace(namespace)
.withMetricName(metricName)
.withDimensions(dimensionFilter)
.withDimensions(dimensionFilters)
.withRecentlyActive("PT3H");
return amazonCloudWatch.listMetrics(listMetricsRequest).getMetrics();
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,14 @@
import com.google.common.collect.Lists;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Comparator;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.TreeSet;
import kotlin.Pair;
import lombok.extern.log4j.Log4j2;

@Log4j2
Expand Down Expand Up @@ -103,16 +105,27 @@ public void validate() throws Exception {
}

List<Metric> actualMetricList = Lists.newArrayList();
addMetrics(
CloudWatchService.SERVICE_DIMENSION,
serviceNames,
expectedMetricList,
actualMetricList);
addMetrics(
CloudWatchService.REMOTE_SERVICE_DIMENSION,
remoteServiceNames,
expectedMetricList,
actualMetricList);

// Add sets of dimesion filters to use for each query to CloudWatch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've now added a second edge case to this logic, i.e. the [RemoteService] and [RemoteService, RemoteTarget] aggregations. It would be worth adding a larger comment here explaining what's happening and explain why in the PR description for future reference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be more descriptive in other comments in your PR as well.

List<List<Pair<String, String>>> dimensionLists = Lists.newArrayList();
for (String serviceName : serviceNames) {
dimensionLists.add(
Arrays.asList(new Pair<>(CloudWatchService.SERVICE_DIMENSION, serviceName)));
}
for (String remoteServiceName : remoteServiceNames) {
dimensionLists.add(
Arrays.asList(
new Pair<>(CloudWatchService.REMOTE_SERVICE_DIMENSION, remoteServiceName)));
}
Comment on lines +115 to +119
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this new logic able to pull [RemoteService, RemoteTarget] without pulling [<other stuff>, RemoteService, RemoteTarget]? If yes, we can add back the other remote service values to remoteServiceNames. If not, the code changes in this PR will cause flaky tests in the long run.

This is due to the fact that when trying to pull [RemoteService, RemoteTarget] if you also pull other aggregations that have the same two attributes you will pull metrics from all test runs in the past three hours that match the filter you set. CW listMetrics API cannot return all of these metrics reliably and we've seen this cause failures before. Please ensure your solution accounts for this by running a few tests in a row and seeing if metrics with attributes containing the wrong test ID appear in the actualMetricList.

dimensionLists.add(
Arrays.asList(
new Pair<>(CloudWatchService.REMOTE_SERVICE_DIMENSION, "AWS.SDK.S3"),
new Pair<>(CloudWatchService.REMOTE_TARGET_DIMENSION, "e2e-test-bucket-name")));
Comment on lines +120 to +123
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the metric exists, how do we know whether the test that just occurred is the one that populated it or if it simply exists from a previous test run? It's entirely possible that a test from 2 hours ago created the metric and then all tests since then did not appropriately populate this metric but because of our .withRecentlyActive("PT3H") we are not finding out at all that this metric is flakily populated.

This is the same issue we had with trying to check [RemoteService] when it is equal to ["www.amazon.com"] for example. (There are also other considerations for this aggregation which I'll mention in another comment)

Note: withRecentlyActive() only accepts PT3H


// Populate actualMetricList with each set of dimension filters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Populate actualMetricList with each set of dimension filters
// Populate actualMetricList with metrics that pass through one of the dimension filters

for (List<Pair<String, String>> dimensionList : dimensionLists) {
addMetrics(dimensionList, expectedMetricList, actualMetricList);
}

// remove the skip dimensions
log.info("dimensions to be skipped in validation: {}", skippedDimensionNameList);
Expand All @@ -132,16 +145,12 @@ public void validate() throws Exception {
}

private void addMetrics(
String dimensionName,
List<String> dimensionValues,
List<Pair<String, String>> dimensionList,
List<Metric> expectedMetricList,
List<Metric> actualMetricList)
throws Exception {
for (String dimensionValue : dimensionValues) {
actualMetricList.addAll(
this.listMetricFromCloudWatch(
cloudWatchService, expectedMetricList, dimensionName, dimensionValue));
}
actualMetricList.addAll(
this.listMetricFromCloudWatch(cloudWatchService, expectedMetricList, dimensionList));
}

/**
Expand Down Expand Up @@ -194,8 +203,7 @@ private void compareMetricLists(List<Metric> toBeCheckedMetricList, List<Metric>
private List<Metric> listMetricFromCloudWatch(
CloudWatchService cloudWatchService,
List<Metric> expectedMetricList,
String dimensionKey,
String dimensionValue)
List<Pair<String, String>> dimensionList)
throws IOException {
// put namespace into the map key, so that we can use it to search metric
HashMap<String, String> metricNameMap = new HashMap<>();
Expand All @@ -207,8 +215,7 @@ private List<Metric> listMetricFromCloudWatch(
List<Metric> result = new ArrayList<>();
for (String metricName : metricNameMap.keySet()) {
result.addAll(
cloudWatchService.listMetrics(
metricNameMap.get(metricName), metricName, dimensionKey, dimensionValue));
cloudWatchService.listMetrics(metricNameMap.get(metricName), metricName, dimensionList));
}
return result;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,34 @@
name: RemoteTarget
value: e2e-test-bucket-name

-
metricName: Latency
namespace: {{metricNamespace}}
dimensions:
-
name: Service
value: {{serviceName}}
-
name: HostedIn.Environment
value: EC2
-
name: RemoteService
value: AWS.SDK.S3
-
name: RemoteTarget
value: e2e-test-bucket-name

-
metricName: Latency
namespace: {{metricNamespace}}
dimensions:
-
name: RemoteService
value: AWS.SDK.S3
-
name: RemoteTarget
value: e2e-test-bucket-name

-
metricName: Error
namespace: {{metricNamespace}}
Expand Down Expand Up @@ -236,6 +264,34 @@
name: RemoteTarget
value: e2e-test-bucket-name

-
metricName: Error
namespace: {{metricNamespace}}
dimensions:
-
name: Service
value: {{serviceName}}
-
name: HostedIn.Environment
value: EC2
-
name: RemoteService
value: AWS.SDK.S3
-
name: RemoteTarget
value: e2e-test-bucket-name

-
metricName: Error
namespace: {{metricNamespace}}
dimensions:
-
name: RemoteService
value: AWS.SDK.S3
-
name: RemoteTarget
value: e2e-test-bucket-name

-
metricName: Fault
namespace: {{metricNamespace}}
Expand Down Expand Up @@ -355,4 +411,31 @@
name: RemoteTarget
value: e2e-test-bucket-name

-
metricName: Fault
namespace: {{metricNamespace}}
dimensions:
-
name: Service
value: {{serviceName}}
-
name: HostedIn.Environment
value: EC2
-
name: RemoteService
value: AWS.SDK.S3
-
name: RemoteTarget
value: e2e-test-bucket-name

-
metricName: Fault
namespace: {{metricNamespace}}
dimensions:
-
name: RemoteService
value: AWS.SDK.S3
-
name: RemoteTarget
value: e2e-test-bucket-name

Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,37 @@
name: RemoteTarget
value: e2e-test-bucket-name

-
metricName: Latency
namespace: {{metricNamespace}}
dimensions:
-
name: Service
value: {{serviceName}}
-
name: HostedIn.EKS.Cluster
value: {{platformInfo}}
-
name: HostedIn.K8s.Namespace
value: {{appNamespace}}
-
name: RemoteService
value: AWS.SDK.S3
-
name: RemoteTarget
value: e2e-test-bucket-name

-
metricName: Latency
namespace: {{metricNamespace}}
dimensions:
-
name: RemoteService
value: AWS.SDK.S3
-
name: RemoteTarget
value: e2e-test-bucket-name

-
metricName: Error
namespace: {{metricNamespace}}
Expand Down Expand Up @@ -278,6 +309,37 @@
name: RemoteTarget
value: e2e-test-bucket-name

-
metricName: Error
namespace: {{metricNamespace}}
dimensions:
-
name: Service
value: {{serviceName}}
-
name: HostedIn.EKS.Cluster
value: {{platformInfo}}
-
name: HostedIn.K8s.Namespace
value: {{appNamespace}}
-
name: RemoteService
value: AWS.SDK.S3
-
name: RemoteTarget
value: e2e-test-bucket-name

-
metricName: Error
namespace: {{metricNamespace}}
dimensions:
-
name: RemoteService
value: AWS.SDK.S3
-
name: RemoteTarget
value: e2e-test-bucket-name

-
metricName: Fault
namespace: {{metricNamespace}}
Expand Down Expand Up @@ -411,6 +473,37 @@
-
name: RemoteOperation
value: GetBucketLocation
-
name: RemoteService
value: AWS.SDK.S3
-
name: RemoteTarget
value: e2e-test-bucket-name

-
metricName: Fault
namespace: {{metricNamespace}}
dimensions:
-
name: Service
value: {{serviceName}}
-
name: HostedIn.EKS.Cluster
value: {{platformInfo}}
-
name: HostedIn.K8s.Namespace
value: {{appNamespace}}
-
name: RemoteService
value: AWS.SDK.S3
-
name: RemoteTarget
value: e2e-test-bucket-name

-
metricName: Fault
namespace: {{metricNamespace}}
dimensions:
-
name: RemoteService
value: AWS.SDK.S3
Expand Down
Loading
Loading