Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First ECS metric sanity test - ECS EC2 Launch Type Daemon Deployment - Container Insights #26

Merged
merged 109 commits into from
Dec 5, 2022

Conversation

ChenaLee
Copy link
Contributor

@ChenaLee ChenaLee commented Nov 17, 2022

Description of the issue

ECS metric test needs to be added

Out of scope

  • Some duplicated code for now between ECS and EC2 test. This will be removed in future commits. Trying to push what already works.

Description of changes

Added container insights sanity test for ecs-ec2launchtype-daemondeployment

License

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Tests

https://github.com/ChenaLee/amazon-cloudwatch-agent/actions/runs/3482812632/jobs/5827664654

Copy link
Contributor

@khanhntd khanhntd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall to me and ignores many things that can be done in the next PR.

environment/MetaData.go Outdated Show resolved Hide resolved
environment/MetaData.go Show resolved Hide resolved
environment/compute_type/compute_type.go Outdated Show resolved Hide resolved
terraform/ecs/linux/ec2_launch/daemon/iam.tf Show resolved Hide resolved
test/ssmParameter.go Outdated Show resolved Hide resolved

func (t *ContainerInsightsTestRunner) getMeasuredMetrics() []string {
return []string{
"instance_memory_utilization", "instance_number_of_running_tasks", "instance_memory_reserved_capacity",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to leave comments here with the public document that talks about metrics that CWAgent is gathering

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo this is the source for public docs, not the other way around. I don't mind it but why do you think that will be helpful?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better for new dev to know what we support for customers currently. So if the values are mismatch, we can instantly fixing it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing it here doesn't fix it in the agent. This is just testing.

}

results := []ContainerInstance{}
for _, containerInstance := range describeContainerInstancesOutput.ContainerInstances {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it better to have an internal that calls IMDS (from EC2) and also metadata from ECS so it would better to have less specific calls ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests don't run on container instances. Agent runs there but tests execute from github's own runners.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can run that. However, its would not be the same case for Fargate (but based on the aim for this PR, should we only focus on EC2).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We can't know exact ec2 instances running without calling this describecontainerinstances, if using autoscaling (which this test is)
  2. metadata from ecs thingy seems to be callable using environment variable within each container/task, which aren't accessible in github's host?

environment/ecs_launch_type/ecs_launch_type.go Outdated Show resolved Hide resolved
environment/MetaData.go Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
terraform/ecs/linux/ec2_launch/daemon/main.tf Outdated Show resolved Hide resolved
test/metric_value_benchmark/base_test.go Outdated Show resolved Hide resolved
test/metric_value_benchmark/covered_test_list.go Outdated Show resolved Hide resolved
test/metric_value_benchmark/ecs_daemon_base_test_runner.go Outdated Show resolved Hide resolved
test/ssmParameter.go Outdated Show resolved Hide resolved
test/metric_value_benchmark/ecs_daemon_base_test_runner.go Outdated Show resolved Hide resolved
test/metric_value_benchmark/ecs_daemon_base_test_runner.go Outdated Show resolved Hide resolved
environment/MetaData.go Show resolved Hide resolved
environment/MetaData.go Show resolved Hide resolved
test/metric/container_insights.go Show resolved Hide resolved
Copy link
Contributor Author

@ChenaLee ChenaLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed comments

)

var metricValueFetchers = []MetricValueFetcher{
&CPUMetricValueFetcher{},
&MemMetricValueFetcher{},
&ProcStatMetricValueFetcher{},
&DiskIOMetricValueFetcher{},
&NetMetricValueFetcher{},
&NetMetricValueFetcher{},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Space

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed!

depends_on = [aws_iam_role_policy_attachment.ecs_task_execution_role]
}

resource "null_resource" "validator" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can change to ssh into EC2 instead of local-exe that similar to other EC2 test. Since there are no use case that shares between EC2 and Fargate (except Prometheus)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the benefit? also not extensible if there's a test case with multiple ec2 instances behind the same ecs cluster.


var _ MetricValueFetcher = (*ContainerInsightsValueFetcher)(nil)

func (f *ContainerInsightsValueFetcher) Fetch(namespace, metricName string, stat Statistics) (MetricValues, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Namespace should be unique for container insights (e.g AWS/ECS/ContainerInsights). Therefore, should we strive for a function call (determine Namespace based on EKS/ECS/EC2)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. Namespace should be unique? Unique in.. what? Also this entire fetch logic is changed here #41 so let's talk over there

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Therefore, should we strive for a function call

What does this mean? The test calls this and passes in ECS/ContainerInsights. That is sufficient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Container Insights, there always a default namespace (e.g EKS Container Insights and ECS Container Insights. Moreover, we have a function to detect if we run in EC2, ECS, EKS so want to ask if we could use/improve this general function to detect the namespace beforehand.

Comment on lines +64 to +75
type baseMetricValueFetcher struct {
Env *environment.MetaData
}

func (f *baseMetricValueFetcher) fetch(namespace, metricName string, metricSpecificDimensions []types.Dimension, stat Statistics) (MetricValues, error) {
func (f *baseMetricValueFetcher) getEnv() *environment.MetaData {
return f.Env
}

func (f *baseMetricValueFetcher) setEnv(env *environment.MetaData) {
f.Env = env
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I don't like getter/setter model since its too hollow for me and does not give any value much. IIRC you can access variable as long as its uppercase when outside of function.

ec2InstanceId := test.GetInstanceId()
instanceIdDimension := types.Dimension{

//TODO For now they can stay. Later host metrics fetchers might need to be flexible on how to get instance Id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it should be the same considers its both EC2.

Copy link
Contributor Author

@ChenaLee ChenaLee Nov 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so? For plain ec2 tests we are creating an ec2 instance, ssh into it, and run both test and agent on that created ec2 instance. For ecs tests ec2 instances exist but we aren't ssh'ing into it to run tests on the instances. What the tests have access to is different because of that

@@ -34,6 +35,10 @@ type TestRunner struct {
testRunner ITestRunner
}

type BaseTestRunner struct {
*metric.MetricFetcherFactory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be IECSTestRunner ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which one? why? This is currently used for ec2 tests. Note: my other comment about later plan to merge & why I haven't merged the two


func (t *ContainerInsightsTestRunner) getMeasuredMetrics() []string {
return []string{
"instance_memory_utilization", "instance_number_of_running_tasks", "instance_memory_reserved_capacity",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better for new dev to know what we support for customers currently. So if the values are mismatch, we can instantly fixing it.

func getSsmClient() (*ssm.Client, context.Context, error) {
if ssmClient == nil {
ssmCtx = context.Background()
cfg, err := config.LoadDefaultConfig(ssmCtx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the others client (e.g cloudwatch) should share the same configuration without any different (e.g credentials, etc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want me to pull existing stuff out?

return putParameter(name, value, types.ParameterTypeString)
}

func putParameter(name string, value string, paramType types.ParameterType) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func putParameter(name string, value string, paramType types.ParameterType) error {
func putParameter(name, value string, paramType types.ParameterType) error {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated!

"github.com/aws/amazon-cloudwatch-agent-test/test/status"
)

type IECSTestRunner interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This share a lots with basedTestRunner. Should we create a common interface and runStrategy can be a part of it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep this was intentional to split PRs. 1) needed to make it work for ecs to even know what I'm going to do & 2) it was probably going to be extra line changes in this PR and take longer to make sure nothing breaks. I plan to merge this with a shared base runner with injectable structs

khanhntd
khanhntd previously approved these changes Dec 2, 2022
Copy link
Contributor

@khanhntd khanhntd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -0,0 +1,96 @@
// Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this comment got missed somewhere. The file name shouldn't be capitalized.

command = <<-EOT
echo "Validating metrics/logs"
cd ../../../../..
go test ${var.test_dir} -timeout 0 -computeType=ECS -ecsLaunchType=EC2 -ecsDeploymentStrategy=DAEMON -cwagentConfigSsmParamName=${local.cwagent_config_ssm_param_name} -clusterArn=${aws_ecs_cluster.cluster.arn} -cwagentECSServiceName=${aws_ecs_service.cwagent_service.name} -v --tags=integration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so this is why we have to do that weird MetaData vs MetaDataStrings stuff in the test setup? Makes me think we should pivot from doing a "test" here and should just invoke a main. But that's a problem for another day.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's what it's doing. feel free to change :P

@ChenaLee ChenaLee merged commit 8de2d34 into aws:main Dec 5, 2022

variable "test_dir" {
type = string
default = "./integration/test/ecs/ecs_metadata"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldnt this NOT have integration?

Im surprised nothing failed with this. Is it being overwritten when calling tf apply in the workflow?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I dont see it being overwritten here

execution_role_arn = aws_iam_role.ecs_task_execution_role.arn
cpu = 256
memory = 1024
requires_compatibilities = ["FARGATE"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EC2?

cluster = aws_ecs_cluster.cluster.id
task_definition = aws_ecs_task_definition.extra_apps_task_definition.arn
desired_count = 1
launch_type = "FARGATE"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EC2?

@ChenaLee ChenaLee deleted the ecsTestBase branch June 22, 2023 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants