First ECS metric sanity test - ECS EC2 Launch Type Daemon Deployment - Container Insights #26

ChenaLee · 2022-11-17T15:40:52Z

Description of the issue

ECS metric test needs to be added

Out of scope

Some duplicated code for now between ECS and EC2 test. This will be removed in future commits. Trying to push what already works.

Description of changes

Added container insights sanity test for ecs-ec2launchtype-daemondeployment

License

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Tests

https://github.com/ChenaLee/amazon-cloudwatch-agent/actions/runs/3482812632/jobs/5827664654

khanhntd

LGTM overall to me and ignores many things that can be done in the next PR.

environment/MetaData.go

environment/compute_type/compute_type.go

terraform/ecs/linux/ec2_launch/daemon/iam.tf

test/ssmParameter.go

test/metric_value_benchmark/metrics_value_benchmark_test.go

khanhntd · 2022-11-17T22:39:54Z

test/metric_value_benchmark/container_insights_test.go

+
+func (t *ContainerInsightsTestRunner) getMeasuredMetrics() []string {
+	return []string{
+		"instance_memory_utilization", "instance_number_of_running_tasks", "instance_memory_reserved_capacity",


Would it be better to leave comments here with the public document that talks about metrics that CWAgent is gathering

Imo this is the source for public docs, not the other way around. I don't mind it but why do you think that will be helpful?

It would be better for new dev to know what we support for customers currently. So if the values are mismatch, we can instantly fixing it.

Fixing it here doesn't fix it in the agent. This is just testing.

khanhntd · 2022-11-17T22:53:08Z

test/ecs.go

+	}
+
+	results := []ContainerInstance{}
+	for _, containerInstance := range describeContainerInstancesOutput.ContainerInstances {


Would it better to have an internal that calls IMDS (from EC2) and also metadata from ECS so it would better to have less specific calls ?

These tests don't run on container instances. Agent runs there but tests execute from github's own runners.

We can run that. However, its would not be the same case for Fargate (but based on the aim for this PR, should we only focus on EC2).

We can't know exact ec2 instances running without calling this describecontainerinstances, if using autoscaling (which this test is)

metadata from ecs thingy seems to be callable using environment variable within each container/task, which aren't accessible in github's host?

environment/ecs_launch_type/ecs_launch_type.go

environment/MetaData.go

go.mod

terraform/ecs/linux/ec2_launch/daemon/main.tf

terraform/ecs/linux/ec2_launch/daemon/variables.tf

test/metric_value_benchmark/base_test.go

test/metric_value_benchmark/covered_test_list.go

test/metric_value_benchmark/metrics_value_benchmark_test.go

test/metric_value_benchmark/ecs_daemon_base_test_runner.go

test/ssmParameter.go

test/metric_value_benchmark/ecs_daemon_base_test_runner.go

test/metric_value_benchmark/metrics_value_benchmark_test.go

environment/MetaData.go

test/metric/container_insights.go

ChenaLee

Addressed comments

khanhntd · 2022-11-25T13:04:41Z

test/metric/metric_value_query.go

 )

 var metricValueFetchers = []MetricValueFetcher{
 	&CPUMetricValueFetcher{},
 	&MemMetricValueFetcher{},
 	&ProcStatMetricValueFetcher{},
 	&DiskIOMetricValueFetcher{},
-	&NetMetricValueFetcher{},
+  &NetMetricValueFetcher{},


Nit: Space

khanhntd · 2022-11-25T13:13:39Z

terraform/ecs/linux/ec2_launch/daemon/main.tf

+  depends_on = [aws_iam_role_policy_attachment.ecs_task_execution_role]
+}
+
+resource "null_resource" "validator" {


We can change to ssh into EC2 instead of local-exe that similar to other EC2 test. Since there are no use case that shares between EC2 and Fargate (except Prometheus)

what's the benefit? also not extensible if there's a test case with multiple ec2 instances behind the same ecs cluster.

khanhntd · 2022-11-25T13:17:34Z

test/metric/container_insights.go

+
+var _ MetricValueFetcher = (*ContainerInsightsValueFetcher)(nil)
+
+func (f *ContainerInsightsValueFetcher) Fetch(namespace, metricName string, stat Statistics) (MetricValues, error) {


Nit: Namespace should be unique for container insights (e.g AWS/ECS/ContainerInsights). Therefore, should we strive for a function call (determine Namespace based on EKS/ECS/EC2)

I don't understand. Namespace should be unique? Unique in.. what? Also this entire fetch logic is changed here #41 so let's talk over there

Therefore, should we strive for a function call

What does this mean? The test calls this and passes in ECS/ContainerInsights. That is sufficient.

In Container Insights, there always a default namespace (e.g EKS Container Insights and ECS Container Insights. Moreover, we have a function to detect if we run in EC2, ECS, EKS so want to ask if we could use/improve this general function to detect the namespace beforehand.

khanhntd · 2022-11-25T13:19:46Z

test/metric/metric_value_query.go

+type baseMetricValueFetcher struct {
+	Env *environment.MetaData
+}

-func (f *baseMetricValueFetcher) fetch(namespace, metricName string, metricSpecificDimensions []types.Dimension, stat Statistics) (MetricValues, error) {
+func (f *baseMetricValueFetcher) getEnv() *environment.MetaData {
+	return f.Env
+}
+
+func (f *baseMetricValueFetcher) setEnv(env *environment.MetaData) {
+	f.Env = env
+}


TBH, I don't like getter/setter model since its too hollow for me and does not give any value much. IIRC you can access variable as long as its uppercase when outside of function.

khanhntd · 2022-11-25T13:20:28Z

test/metric/metric_value_query.go

 	ec2InstanceId := test.GetInstanceId()
-	instanceIdDimension := types.Dimension{
+
+	//TODO For now they can stay. Later host metrics fetchers might need to be flexible on how to get instance Id


IMO, it should be the same considers its both EC2.

I don't think so? For plain ec2 tests we are creating an ec2 instance, ssh into it, and run both test and agent on that created ec2 instance. For ecs tests ec2 instances exist but we aren't ssh'ing into it to run tests on the instances. What the tests have access to is different because of that

khanhntd · 2022-11-25T13:22:47Z

test/metric_value_benchmark/base_test.go

@@ -34,6 +35,10 @@ type TestRunner struct {
 	testRunner ITestRunner
 }

+type BaseTestRunner struct {
+	*metric.MetricFetcherFactory


Should it be IECSTestRunner ?

which one? why? This is currently used for ec2 tests. Note: my other comment about later plan to merge & why I haven't merged the two

khanhntd · 2022-11-25T13:24:13Z

test/metric_value_benchmark/container_insights_test.go

+
+func (t *ContainerInsightsTestRunner) getMeasuredMetrics() []string {
+	return []string{
+		"instance_memory_utilization", "instance_number_of_running_tasks", "instance_memory_reserved_capacity",


It would be better for new dev to know what we support for customers currently. So if the values are mismatch, we can instantly fixing it.

khanhntd · 2022-11-25T13:29:47Z

test/ssmParameter.go

+func getSsmClient() (*ssm.Client, context.Context, error) {
+	if ssmClient == nil {
+		ssmCtx = context.Background()
+		cfg, err := config.LoadDefaultConfig(ssmCtx)


IMO, the others client (e.g cloudwatch) should share the same configuration without any different (e.g credentials, etc)

You want me to pull existing stuff out?

khanhntd · 2022-11-25T13:30:59Z

test/ssmParameter.go

+	return putParameter(name, value, types.ParameterTypeString)
+}
+
+func putParameter(name string, value string, paramType types.ParameterType) error {


Suggested change

func putParameter(name string, value string, paramType types.ParameterType) error {

func putParameter(name, value string, paramType types.ParameterType) error {

khanhntd · 2022-11-25T13:33:12Z

test/metric_value_benchmark/ecs_daemon_base_test_runner.go

+	"github.com/aws/amazon-cloudwatch-agent-test/test/status"
+)
+
+type IECSTestRunner interface {


This share a lots with basedTestRunner. Should we create a common interface and runStrategy can be a part of it

Yep this was intentional to split PRs. 1) needed to make it work for ecs to even know what I'm going to do & 2) it was probably going to be extra line changes in this PR and take longer to make sure nothing breaks. I plan to merge this with a shared base runner with injectable structs

khanhntd

LGTM.

"failed creating ECS Task Definition (cwagent-task-family-bc50abafead36f80): ClientException: Fargate compatible task definitions do not support sourcePath"

…n't run after ini()

SaxyPandaBear · 2022-12-05T13:46:38Z

environment/MetaData.go

@@ -0,0 +1,96 @@
+// Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.


I think that this comment got missed somewhere. The file name shouldn't be capitalized.

SaxyPandaBear · 2022-12-05T13:51:13Z

terraform/ecs/linux/ec2_launch/daemon/main.tf

+    command = <<-EOT
+      echo "Validating metrics/logs"
+      cd ../../../../..
+      go test ${var.test_dir} -timeout 0 -computeType=ECS -ecsLaunchType=EC2 -ecsDeploymentStrategy=DAEMON -cwagentConfigSsmParamName=${local.cwagent_config_ssm_param_name} -clusterArn=${aws_ecs_cluster.cluster.arn} -cwagentECSServiceName=${aws_ecs_service.cwagent_service.name} -v --tags=integration


Okay so this is why we have to do that weird MetaData vs MetaDataStrings stuff in the test setup? Makes me think we should pivot from doing a "test" here and should just invoke a main. But that's a problem for another day.

yes, that's what it's doing. feel free to change :P

sky333999 · 2022-12-16T07:31:13Z

terraform/ecs/linux/ec2_launch/daemon/variables.tf

+
+variable "test_dir" {
+  type    = string
+  default = "./integration/test/ecs/ecs_metadata"


Shouldnt this NOT have integration?

Im surprised nothing failed with this. Is it being overwritten when calling tf apply in the workflow?

Actually I dont see it being overwritten here

sky333999 · 2022-12-16T07:37:57Z

terraform/ecs/linux/ec2_launch/daemon/main.tf

+  execution_role_arn       = aws_iam_role.ecs_task_execution_role.arn
+  cpu                      = 256
+  memory                   = 1024
+  requires_compatibilities = ["FARGATE"]


sky333999 · 2022-12-16T07:38:03Z

terraform/ecs/linux/ec2_launch/daemon/main.tf

+  cluster         = aws_ecs_cluster.cluster.id
+  task_definition = aws_ecs_task_definition.extra_apps_task_definition.arn
+  desired_count   = 1
+  launch_type     = "FARGATE"


ChenaLee requested a review from a team as a code owner November 17, 2022 15:40

ChenaLee mentioned this pull request Nov 17, 2022

Add ECS EC2 Launch Type Daemon Deployment test workflow aws/amazon-cloudwatch-agent#639

Merged

khanhntd reviewed Nov 17, 2022

View reviewed changes

SaxyPandaBear reviewed Nov 18, 2022

View reviewed changes

ChenaLee commented Nov 22, 2022

View reviewed changes

ChenaLee requested review from khanhntd and SaxyPandaBear November 22, 2022 17:50

khanhntd reviewed Nov 25, 2022

View reviewed changes

ChenaLee mentioned this pull request Nov 28, 2022

Ec2 no append dimension test #41

Closed

ChenaLee requested a review from khanhntd November 29, 2022 15:29

khanhntd previously approved these changes Dec 2, 2022

View reviewed changes

ChenaLee added 18 commits December 5, 2022 08:31

make an always failing test that gets called for ecs in benchmark test

673a1be

Terraform required files for daemon run

4283eab

Put new parameter for test

4482e8a

Fix volumes to volume ecs task definition

4f7c280

fix volume - terraform doesn't support what json supports

222e454

Remove comma so terraform apply doesn't error out

3d41278

Fix ssm param name for cwagent config

ad21cd0

Remove sourcePath as fargate doesn't support it

4d4025a

"failed creating ECS Task Definition (cwagent-task-family-bc50abafead36f80): ClientException: Fargate compatible task definitions do not support sourcePath"

correct test directory path

a55321f

Fix genereator test dir name

bae3901

Add dependency for ssm

572281b

add session as dependency

b87ba85

Return error only for ssm param put

b305092

remove unused declaration in put param

09459f7

fix list syntax

57e4cf5

Add flag as dependency for build

a47d029

fix build issues

f41e028

fix build error

2901e76

ChenaLee added 22 commits December 5, 2022 08:34

Readme edit

daba08e

Fix Fetch implementation

5dee2fa

fix build failure

7794e41

Add instance id dimension for diskio and procstat

f6449e8

Print test result in test run instead of teardown because teardown do…

da188c5

…n't run after ini()

delete unused file for now

faacd84

make ecs test fail to see if logs get printed

c181ae5

make test successful and see if logs get printed still

11947ed

Fix no logs if test succeed problem for ecs test

8722278

Edit readme

aa26acb

Resolve PR comments for /environment

0299c7a

Remove aws-sdk-go v1 dependency

25be84b

Remove unnecessary terraform comment

dd94fb6

Remove unnecessary pointer arguments

7255367

Some minor style changes

d788326

Remove covered test list since out of scope

e440dce

Style changes

28f8d98

fix build

03014e6

Merge conflict resolution

dfbaf94

go mod tidy

442bdf6

Minor style changes

1af9eee

go mod tidy

7e7ac96

ChenaLee dismissed khanhntd’s stale review via 7e7ac96 December 5, 2022 13:45

ChenaLee force-pushed the ecsTestBase branch from 5d8ee6b to 7e7ac96 Compare December 5, 2022 13:45

ChenaLee requested a review from khanhntd December 5, 2022 13:45

SaxyPandaBear approved these changes Dec 5, 2022

View reviewed changes

khanhntd approved these changes Dec 5, 2022

View reviewed changes

ChenaLee merged commit 8de2d34 into aws:main Dec 5, 2022

sky333999 reviewed Dec 16, 2022

View reviewed changes

ChenaLee deleted the ecsTestBase branch June 22, 2023 17:39


		var _ MetricValueFetcher = (*ContainerInsightsValueFetcher)(nil)

		func (f *ContainerInsightsValueFetcher) Fetch(namespace, metricName string, stat Statistics) (MetricValues, error) {

	func putParameter(name string, value string, paramType types.ParameterType) error {
	func putParameter(name, value string, paramType types.ParameterType) error {

		@@ -0,0 +1,96 @@
		// Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.

First ECS metric sanity test - ECS EC2 Launch Type Daemon Deployment - Container Insights #26

First ECS metric sanity test - ECS EC2 Launch Type Daemon Deployment - Container Insights #26

Conversation

ChenaLee commented Nov 17, 2022 • edited Loading

Description of the issue

Out of scope

Description of changes

License

Tests

khanhntd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChenaLee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChenaLee Nov 29, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

khanhntd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChenaLee commented Nov 17, 2022 •

edited

Loading

ChenaLee Nov 29, 2022 •

edited

Loading