Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for Metric Rollup Addition of [RemoteService, RemoteTarget, ...] #729

Closed
wants to merge 3 commits into from

Conversation

jj22ee
Copy link

@jj22ee jj22ee commented Jan 25, 2024

Issue #, if available:
aws/amazon-cloudwatch-agent#1010
Recently the following metric rollups were added to 2 AppSignal Configuration files. Tests need to be added for them

  • [RemoteService, RemoteTarget]
  • [HostedIn.<Attributes>, Service, RemoteService, RemoteTarget]

Description of changes:
Add tests for new metric rollups

Testing done:
EKS in IAD: https://github.com/jj22ee/amazon-cloudwatch-agent/actions/runs/7659009903/job/20873170967
EC2 in IAD: https://github.com/jj22ee/amazon-cloudwatch-agent/actions/runs/7659098851/job/20873465899

./gradlew testing:validator:test

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@jj22ee jj22ee requested a review from a team as a code owner January 25, 2024 18:57
@codecov-commenter
Copy link

Codecov Report

Attention: 102 lines in your changes are missing coverage. Please review.

Comparison is base (09e6487) 85.71% compared to head (4204d00) 50.87%.
Report is 233 commits behind head on main.

Files Patch % Lines
...ent/providers/AwsAppSignalsCustomizerProvider.java 24.00% 35 Missing and 3 partials ⚠️
...gent/providers/AwsSpanMetricsProcessorBuilder.java 0.00% 20 Missing ⚠️
...ders/AttributePropagatingSpanProcessorBuilder.java 0.00% 16 Missing ⚠️
...viders/AwsMetricAttributesSpanExporterBuilder.java 0.00% 11 Missing ⚠️
...try/javaagent/providers/AwsSpanProcessingUtil.java 90.16% 1 Missing and 5 partials ⚠️
...vaagent/providers/AwsMetricAttributeGenerator.java 96.89% 2 Missing and 3 partials ⚠️
...y/javaagent/providers/AwsSpanMetricsProcessor.java 91.48% 0 Missing and 4 partials ⚠️
...t/providers/AttributePropagatingSpanProcessor.java 94.59% 2 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@              Coverage Diff              @@
##               main     #729       +/-   ##
=============================================
- Coverage     85.71%   50.87%   -34.84%     
- Complexity       19      266      +247     
=============================================
  Files             3       39       +36     
  Lines            49     1315     +1266     
  Branches          5      144      +139     
=============================================
+ Hits             42      669      +627     
- Misses            3      614      +611     
- Partials          4       32       +28     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@majanjua-amzn majanjua-amzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall seems to be working but there's some design questions and investigations into the unintended effects of the way the logic works to check.

Also, do not merge this code until CW Agent performs their release, otherwise you will cause canary to break leading to sev 2.

new Pair<>(CloudWatchService.REMOTE_SERVICE_DIMENSION, "AWS.SDK.S3"),
new Pair<>(CloudWatchService.REMOTE_TARGET_DIMENSION, "e2e-test-bucket-name")));

// Populate actualMetricList with each set of dimension filters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Populate actualMetricList with each set of dimension filters
// Populate actualMetricList with metrics that pass through one of the dimension filters

expectedMetricList,
actualMetricList);

// Add sets of dimesion filters to use for each query to CloudWatch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've now added a second edge case to this logic, i.e. the [RemoteService] and [RemoteService, RemoteTarget] aggregations. It would be worth adding a larger comment here explaining what's happening and explain why in the PR description for future reference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be more descriptive in other comments in your PR as well.

Comment on lines +115 to +119
for (String remoteServiceName : remoteServiceNames) {
dimensionLists.add(
Arrays.asList(
new Pair<>(CloudWatchService.REMOTE_SERVICE_DIMENSION, remoteServiceName)));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this new logic able to pull [RemoteService, RemoteTarget] without pulling [<other stuff>, RemoteService, RemoteTarget]? If yes, we can add back the other remote service values to remoteServiceNames. If not, the code changes in this PR will cause flaky tests in the long run.

This is due to the fact that when trying to pull [RemoteService, RemoteTarget] if you also pull other aggregations that have the same two attributes you will pull metrics from all test runs in the past three hours that match the filter you set. CW listMetrics API cannot return all of these metrics reliably and we've seen this cause failures before. Please ensure your solution accounts for this by running a few tests in a row and seeing if metrics with attributes containing the wrong test ID appear in the actualMetricList.

Comment on lines +120 to +123
dimensionLists.add(
Arrays.asList(
new Pair<>(CloudWatchService.REMOTE_SERVICE_DIMENSION, "AWS.SDK.S3"),
new Pair<>(CloudWatchService.REMOTE_TARGET_DIMENSION, "e2e-test-bucket-name")));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the metric exists, how do we know whether the test that just occurred is the one that populated it or if it simply exists from a previous test run? It's entirely possible that a test from 2 hours ago created the metric and then all tests since then did not appropriately populate this metric but because of our .withRecentlyActive("PT3H") we are not finding out at all that this metric is flakily populated.

This is the same issue we had with trying to check [RemoteService] when it is equal to ["www.amazon.com"] for example. (There are also other considerations for this aggregation which I'll mention in another comment)

Note: withRecentlyActive() only accepts PT3H

@jj22ee
Copy link
Author

jj22ee commented Jan 26, 2024

In discussion, a solution for both issues ([RemoteService, RemoteTarget] pulling in too much data which comes from previous test runs and can validate the current test run) is to update the aws-sdk-call api to take in an identifier (like Pod_IP) and use that in the bucket name (e.g. e2e-test-bucket-name-<Pod_IP>). Then the tests will need to specify this ID in queries that include RemoteTarget dimension.

Copy link

This PR is stale because it has been open 60 days with no activity.

@github-actions github-actions bot added the stale label Mar 31, 2024
@jj22ee
Copy link
Author

jj22ee commented Apr 16, 2024

@jj22ee jj22ee closed this Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants