Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QueueProcessingFargateServiceProps Cooldown property is not used when using a CPU Target Utilization scaling policy. #31172

Open
1 task
disophisis opened this issue Aug 21, 2024 · 3 comments
Labels
@aws-cdk/aws-ecs-patterns Related to ecs-patterns library bug This issue is a bug. effort/medium Medium work item – several days of effort p2

Comments

@disophisis
Copy link

disophisis commented Aug 21, 2024

Describe the bug

QueueProcessingFargateServiceProps Cooldown property is not used when using a CPU Target Utilization scaling policy.

Regression Issue

  • Select this option if this issue appears to be a regression.

Last Known Working CDK Version

No response

Expected Behavior

When creating a QueueProcessingFargateService and when using CPU Target Utilization, the cooldown provided should override the default (300 seconds) and be added to scale-in and scale-out.

Current Behavior

Currently, when creating a QueueProcessingFargateService and using CPU Target Utilization, the Cooldown property is not used, and the default of 300 seconds is used for scale-in and scale-out.

Reproduction Steps

var service = new QueueProcessingFargateService(this, "EventsHandler", new QueueProcessingFargateServiceProps
{
	ContainerName = "EventsHandler",
	Cluster = this.Cluster,
	Image = // image
	Environment = new Dictionary<string, string>
	{
		// environment
	},
	MaxScalingCapacity = 4,
	MinScalingCapacity = 1,
	CpuTargetUtilizationPercent = 75,
	Cooldown = Duration.Seconds(120),
	Queue = _eventsQueue,
	LogDriver = logDriver,
	HealthCheck = healthCheck,
});
  1. Create a basic QueueProcessingFargateService with CpuTargetUtilizationPercent and Cooldown set
  2. Deploy the resource to AWS
  3. Either pull the resource scaling policy with the CLI or view it in the ECS dashboard
  4. Note that the scale-in and scale-out cooldown times are not set.

Possible Solution

BaseService contains AutoScaleTaskCount(EnableScalingProps props) which works with ApplicationLoadBalancedFargateService, but appears not to work with Queue Processors because there is already some default scaling created under the hood. Allowing the use of AutoScaleTaskCount would be nice because it seems a bit more flexible:

var autoscalingGroup = service.Service.AutoScaleTaskCount(new EnableScalingProps
{
	MinCapacity = _props.Configuration.ECSMinCapacity,
	MaxCapacity = _props.Configuration.ECSMaxCapacity,
});

autoscalingGroup.ScaleOnCpuUtilization("cpuScaling", new CpuUtilizationScalingProps
{
	TargetUtilizationPercent = 60,
	ScaleInCooldown = Duration.Seconds(300),
	ScaleOutCooldown = Duration.Seconds(60),
});

Currently when attempting to use AutoScaleTaskCount with a QueueProcessingFargateService, we get this during the synth:

Unhandled exception. System.Exception: Error: AutoScaling of task count already enabled for this service

Additional Information/Context

image

CDK CLI Version

2.130.0

Framework Version

2.130.0

Node.js Version

21.2.0

OS

Linux

Language

.NET

Language Version

8

Other information

No response

@disophisis disophisis added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Aug 21, 2024
@github-actions github-actions bot added the @aws-cdk/aws-iam Related to AWS Identity and Access Management label Aug 21, 2024
@disophisis disophisis changed the title (module name): (short issue description) QueueProcessingFargateServiceProps Cooldown property is not used when using a CPU Target Utilization scaling policy. Aug 21, 2024
@ashishdhingra ashishdhingra self-assigned this Aug 21, 2024
@ashishdhingra ashishdhingra added p2 needs-reproduction This issue needs reproduction. @aws-cdk/aws-ecs-patterns Related to ecs-patterns library and removed needs-triage This issue or PR still needs to be triaged. @aws-cdk/aws-iam Related to AWS Identity and Access Management labels Aug 21, 2024
@ashishdhingra
Copy link
Contributor

ashishdhingra commented Aug 22, 2024

@disophisis Good afternoon. Thanks for reporting the issue. Would it be possible to share minimal reproducible CDK code to investigate the issue?

I'm unsure if the cooldown property set in QueueProcessingFargateServiceProps is same as setting Scale-out cooldown period and Scale-in cooldown period for ECSServiceAverageCPUUtilization metric. Using the simple code below:

var service = new ecsPatterns.QueueProcessingFargateService(this, 'test', {
      image: ecs.ContainerImage.fromRegistry('public.ecr.aws/amazonlinux/amazonlinux:latest'),
      cooldown: cdk.Duration.seconds(120),
      cpuTargetUtilizationPercent: 75,
   });

results in following Autoscaling policies being added:

Screenshot 2024-08-23 at 3 16 14 PM

The cooldown property is used in scalingTarget.scaleOnMetric() for sqsQueue.metricApproximateNumberOfMessagesVisible() metric here. I'm unsure if the value for cooldown should be used for setting Scale-out cooldown period and Scale-in cooldown period for ECSServiceAverageCPUUtilization metric.

Thanks,
Ashish

@ashishdhingra ashishdhingra added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. labels Aug 23, 2024
@pahud
Copy link
Contributor

pahud commented Aug 26, 2024

Given

export class DummyStack extends Stack {
	constructor(scope: Construct, id: string, props: StackProps) {
		super(scope, id, props);

		const vpc = getDefaultVpc(this);
		const cluster = new ecs.Cluster(this, 'Cluster', { vpc });
		new ecsp.QueueProcessingFargateService(this, 'Service', {
			cluster,
			image: ecs.ContainerImage.fromRegistry('foo:latest'),
			cooldown: Duration.seconds(100),
		})
	}
}

cdk synth

  ServiceQueueProcessingFargateServiceTaskCountTargetQueueMessagesVisibleScalingUpperPolicy344EA7D4:
    Type: AWS::ApplicationAutoScaling::ScalingPolicy
    Properties:
      PolicyName: dummystackServiceQueueProcessingFargateServiceTaskCountTargetQueueMessagesVisibleScalingUpperPolicy83CB7F72
      PolicyType: StepScaling
      ScalingTargetId:
        Ref: ServiceQueueProcessingFargateServiceTaskCountTarget2625B8CE
      StepScalingPolicyConfiguration:
        AdjustmentType: ChangeInCapacity
        Cooldown: 100
        MetricAggregationType: Maximum
        StepAdjustments:
          - MetricIntervalLowerBound: 0
            MetricIntervalUpperBound: 400
            ScalingAdjustment: 1
          - MetricIntervalLowerBound: 400
            ScalingAdjustment: 5

In this case cooldown is used in StepScalingPolicyConfiguration.

Cooldown
The amount of time, in seconds, to wait for a previous scaling activity to take effect. If not specified, the default value is 300. For more information, see Cooldown period in the Application Auto Scaling User Guide.

While ScaleInCooldown and ScaleOutCoolDown are used in TargetTrackingScalingPolicyConfiguration, which in this case of synth output does not come with(default value would apply)

ScaleInCooldown
The amount of time, in seconds, after a scale-in activity completes before another scale-in activity can start. For more information and for default values, see Define cooldown periods in the Application Auto Scaling User Guide.

ScaleOutCooldown
The amount of time, in seconds, to wait for a previous scale-out activity to take effect. For more information and for default values, see Define cooldown periods in the Application Auto Scaling User Guide.

 ServiceQueueProcessingFargateServiceTaskCountTargetCpuScalingC17EA933:
    Type: AWS::ApplicationAutoScaling::ScalingPolicy
    Properties:
      PolicyName: dummystackServiceQueueProcessingFargateServiceTaskCountTargetCpuScaling19D8592A
      PolicyType: TargetTrackingScaling
      ScalingTargetId:
        Ref: ServiceQueueProcessingFargateServiceTaskCountTarget2625B8CE
      TargetTrackingScalingPolicyConfiguration:
        PredefinedMetricSpecification:
          PredefinedMetricType: ECSServiceAverageCPUUtilization
        TargetValue: 50

My thoughts:

  1. Those 3 cooldown values are not equivalent.
  2. Applying the cooldown value too all the three props could be breaking change.
  3. It might be an option to expose the ScaleInCooldown and ScaleOutCooldown to the surface as a feature but before that I am afraid we need to work it around using escape hatches to override the values.

@ashishdhingra ashishdhingra added effort/medium Medium work item – several days of effort and removed needs-reproduction This issue needs reproduction. labels Aug 26, 2024
@ashishdhingra ashishdhingra removed their assignment Aug 26, 2024
@disophisis
Copy link
Author

Thank you for supplying that code, Pahud. To your point, I think it would be preferable to expose ScaleInCooldown and ScaleOutCooldown in order to provide the most predictable experience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-ecs-patterns Related to ecs-patterns library bug This issue is a bug. effort/medium Medium work item – several days of effort p2
Projects
None yet
Development

No branches or pull requests

3 participants