-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First ECS metric sanity test - ECS EC2 Launch Type Daemon Deployment - Container Insights #26
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall to me and ignores many things that can be done in the next PR.
|
||
func (t *ContainerInsightsTestRunner) getMeasuredMetrics() []string { | ||
return []string{ | ||
"instance_memory_utilization", "instance_number_of_running_tasks", "instance_memory_reserved_capacity", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to leave comments here with the public document that talks about metrics that CWAgent is gathering
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imo this is the source for public docs, not the other way around. I don't mind it but why do you think that will be helpful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better for new dev to know what we support for customers currently. So if the values are mismatch, we can instantly fixing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing it here doesn't fix it in the agent. This is just testing.
} | ||
|
||
results := []ContainerInstance{} | ||
for _, containerInstance := range describeContainerInstancesOutput.ContainerInstances { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it better to have an internal that calls IMDS (from EC2) and also metadata from ECS so it would better to have less specific calls ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests don't run on container instances. Agent runs there but tests execute from github's own runners.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can run that. However, its would not be the same case for Fargate (but based on the aim for this PR, should we only focus on EC2).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We can't know exact ec2 instances running without calling this describecontainerinstances, if using autoscaling (which this test is)
- metadata from ecs thingy seems to be callable using environment variable within each container/task, which aren't accessible in github's host?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed comments
test/metric/metric_value_query.go
Outdated
) | ||
|
||
var metricValueFetchers = []MetricValueFetcher{ | ||
&CPUMetricValueFetcher{}, | ||
&MemMetricValueFetcher{}, | ||
&ProcStatMetricValueFetcher{}, | ||
&DiskIOMetricValueFetcher{}, | ||
&NetMetricValueFetcher{}, | ||
&NetMetricValueFetcher{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Space
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed!
depends_on = [aws_iam_role_policy_attachment.ecs_task_execution_role] | ||
} | ||
|
||
resource "null_resource" "validator" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can change to ssh
into EC2 instead of local-exe
that similar to other EC2 test. Since there are no use case that shares between EC2 and Fargate (except Prometheus)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the benefit? also not extensible if there's a test case with multiple ec2 instances behind the same ecs cluster.
|
||
var _ MetricValueFetcher = (*ContainerInsightsValueFetcher)(nil) | ||
|
||
func (f *ContainerInsightsValueFetcher) Fetch(namespace, metricName string, stat Statistics) (MetricValues, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Namespace should be unique for container insights (e.g AWS/ECS/ContainerInsights). Therefore, should we strive for a function call (determine Namespace based on EKS/ECS/EC2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. Namespace should be unique? Unique in.. what? Also this entire fetch logic is changed here #41 so let's talk over there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Therefore, should we strive for a function call
What does this mean? The test calls this and passes in ECS/ContainerInsights
. That is sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Container Insights, there always a default namespace (e.g EKS Container Insights and ECS Container Insights. Moreover, we have a function to detect if we run in EC2, ECS, EKS so want to ask if we could use/improve this general function to detect the namespace beforehand.
type baseMetricValueFetcher struct { | ||
Env *environment.MetaData | ||
} | ||
|
||
func (f *baseMetricValueFetcher) fetch(namespace, metricName string, metricSpecificDimensions []types.Dimension, stat Statistics) (MetricValues, error) { | ||
func (f *baseMetricValueFetcher) getEnv() *environment.MetaData { | ||
return f.Env | ||
} | ||
|
||
func (f *baseMetricValueFetcher) setEnv(env *environment.MetaData) { | ||
f.Env = env | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, I don't like getter/setter
model since its too hollow for me and does not give any value much. IIRC you can access variable as long as its uppercase when outside of function.
ec2InstanceId := test.GetInstanceId() | ||
instanceIdDimension := types.Dimension{ | ||
|
||
//TODO For now they can stay. Later host metrics fetchers might need to be flexible on how to get instance Id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, it should be the same considers its both EC2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so? For plain ec2 tests we are creating an ec2 instance, ssh into it, and run both test and agent on that created ec2 instance. For ecs tests ec2 instances exist but we aren't ssh'ing into it to run tests on the instances. What the tests have access to is different because of that
@@ -34,6 +35,10 @@ type TestRunner struct { | |||
testRunner ITestRunner | |||
} | |||
|
|||
type BaseTestRunner struct { | |||
*metric.MetricFetcherFactory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be IECSTestRunner
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which one? why? This is currently used for ec2 tests. Note: my other comment about later plan to merge & why I haven't merged the two
|
||
func (t *ContainerInsightsTestRunner) getMeasuredMetrics() []string { | ||
return []string{ | ||
"instance_memory_utilization", "instance_number_of_running_tasks", "instance_memory_reserved_capacity", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better for new dev to know what we support for customers currently. So if the values are mismatch, we can instantly fixing it.
func getSsmClient() (*ssm.Client, context.Context, error) { | ||
if ssmClient == nil { | ||
ssmCtx = context.Background() | ||
cfg, err := config.LoadDefaultConfig(ssmCtx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, the others client (e.g cloudwatch) should share the same configuration without any different (e.g credentials, etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You want me to pull existing stuff out?
test/ssmParameter.go
Outdated
return putParameter(name, value, types.ParameterTypeString) | ||
} | ||
|
||
func putParameter(name string, value string, paramType types.ParameterType) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func putParameter(name string, value string, paramType types.ParameterType) error { | |
func putParameter(name, value string, paramType types.ParameterType) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated!
"github.com/aws/amazon-cloudwatch-agent-test/test/status" | ||
) | ||
|
||
type IECSTestRunner interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This share a lots with basedTestRunner
. Should we create a common interface and runStrategy
can be a part of it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep this was intentional to split PRs. 1) needed to make it work for ecs to even know what I'm going to do & 2) it was probably going to be extra line changes in this PR and take longer to make sure nothing breaks. I plan to merge this with a shared base runner with injectable structs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
"failed creating ECS Task Definition (cwagent-task-family-bc50abafead36f80): ClientException: Fargate compatible task definitions do not support sourcePath"
…n't run after ini()
@@ -0,0 +1,96 @@ | |||
// Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this comment got missed somewhere. The file name shouldn't be capitalized.
command = <<-EOT | ||
echo "Validating metrics/logs" | ||
cd ../../../../.. | ||
go test ${var.test_dir} -timeout 0 -computeType=ECS -ecsLaunchType=EC2 -ecsDeploymentStrategy=DAEMON -cwagentConfigSsmParamName=${local.cwagent_config_ssm_param_name} -clusterArn=${aws_ecs_cluster.cluster.arn} -cwagentECSServiceName=${aws_ecs_service.cwagent_service.name} -v --tags=integration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay so this is why we have to do that weird MetaData vs MetaDataStrings stuff in the test setup? Makes me think we should pivot from doing a "test" here and should just invoke a main. But that's a problem for another day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's what it's doing. feel free to change :P
|
||
variable "test_dir" { | ||
type = string | ||
default = "./integration/test/ecs/ecs_metadata" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldnt this NOT have integration
?
Im surprised nothing failed with this. Is it being overwritten when calling tf apply in the workflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I dont see it being overwritten here
execution_role_arn = aws_iam_role.ecs_task_execution_role.arn | ||
cpu = 256 | ||
memory = 1024 | ||
requires_compatibilities = ["FARGATE"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EC2?
cluster = aws_ecs_cluster.cluster.id | ||
task_definition = aws_ecs_task_definition.extra_apps_task_definition.arn | ||
desired_count = 1 | ||
launch_type = "FARGATE" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EC2?
Description of the issue
ECS metric test needs to be added
Out of scope
Description of changes
Added container insights sanity test for ecs-ec2launchtype-daemondeployment
License
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Tests
https://github.com/ChenaLee/amazon-cloudwatch-agent/actions/runs/3482812632/jobs/5827664654