Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

st2CICD Reducing costs #104

Closed
amanda11 opened this issue Jun 28, 2022 · 7 comments
Closed

st2CICD Reducing costs #104

amanda11 opened this issue Jun 28, 2022 · 7 comments
Assignees

Comments

@amanda11
Copy link

Our current AWS CICD server costs approximately $122.4 plus tax to run a month just on EC2 costs (other costs such as EBS not included). This is a c5.xlarge server running 24x7 (cost above based on 30 day month).

The actual CI/CD runs currently are:

  • unstable Tuesday 3pm UTC - for approx 40 mins
  • stable Wednesday 1pm UTC - for approx 40 mins
  • unstable Friday 3pm UTC - for approx 40 mins
    Each day at 1am UTC it also runs a check to see if any of the instances created to run the CI jobs need deleting.

We have a number of ways we could reduce this cost, some of these can be combined:

  • Change date/time of runs and then keep servers down for part of week, but up enough to allow debugging
  • Change date/time of runs and get the logs from CI runs onto a S3 bucket, and then keep server down for longer
  • Move to cheaper costing image size e.g. c5a.xlarge
  • Rebuild new server with latest ST2 and then downsize further

Once we loose credits, and by time we add in tax costs it would be good to reduce this cost.

Interested in people's preferences and thoughts.

@amanda11
Copy link
Author

amanda11 commented Jun 28, 2022

One possible reduction which would amount to reducing pre-tax EC2 CI/CD server to $81 or $74 would be following:

  • Switch off the CI/CD server from Friday 4pm UTC to Monday 4am UTC
  • Move the CI runs so that Unstable runs Monday + Thursday, Keep Stable on Wednesday. This means we still have the 2am runs the following days to delete servers, and servers are up for most of week to enable troubleshooting.
  • This reduces to $81.6 + tax for c5.xlarge, and $74 for c5a.xlarge

However if we use the AWS Instance Scheduler that probably costs us $10 a month to achieve that (https://docs.aws.amazon.com/solutions/latest/instance-scheduler-on-aws/cost.html).

Or for our use case, we should be able to achieve the same for less just with Lambda functions, eg. https://aws.amazon.com/premiumsupport/knowledge-center/start-stop-lambda-eventbridge/ as we don't really need those lambda functions running every 5 mins every day which the instance scheduler is doing.

This might be an easy win to reduce the costs quickly but then move onto a solution where we get the logs off the boxes so don't need the CI servers for debugging.

Also perhaps the ST2 workflow that deletes old running EC2 instances that are older than 6 hours from a CI/CD server, could be removed and replaced by a Lambda function to delete any that are running at particular times of day? Then we don't need to keep CI server up for 6 hours after a run.

We could also tighten the schedule even more, e.g. just have both stable and unstable run on the same day and once a week, e.g.
Tuesday 1pm UTC run stable
Tuesday 3pm UTC run unstable
Wednesday 2am run clean up job.

Then we could keep the servers up maybe just from Tuesday 10am UTC to Thursday 10pm UTC. This would then just have a cost of approx $40 + tax a month for the c5a.xlarge instance. And gives us 54 hours after a failed run to debug. Which could be an interim solution, until either we rebuild with newer ST2 version and smaller size, plus getting logs off box so we can investigate a failure without the EC2 instance being up.

@winem
Copy link

winem commented Jun 28, 2022

A solution that is a bit less flexible compared to the AWS Instance Scheduler is to use Scheduled scaling for EC2 Auto Scaling: https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-scheduled-scaling.html

It allows you to configure any ASG capacity (min, max and desired) to be applied at any time with a cron-like syntax.

Let me just share a screenshot to show the capabilities:
Screenshot from 2022-06-28 16-12-02

So, when we say

The actual CI/CD runs currently are:

    unstable Tuesday 3pm UTC - for approx 40 mins
    stable Wednesday 1pm UTC - for approx 40 mins
    unstable Friday 3pm UTC - for approx 40 mins
    Each day at 1am UTC it also runs a check to see if any of the instances created to run the CI jobs need deleting.

We could have a config like

  • start every Tuesday 2:30 pm UTC
  • start every Wednesday 12:30 pm UTC
  • start every Friday 2:30 pm UTC

The shutdown would be scheduled to a later point of time which depends on whether we want to keep the instances alive for debugging or export relevant logs to S3 for example.

Each day at 1am UTC it also runs a check to see if any of the instances created to run the CI jobs need deleting.
may be moved to Lambda triggered once a day if it's running on an instance that is not needed otherwise.

Terraform documentation: https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/autoscaling_schedule

An example to start an instance every Tuesday at 1 pm and shut it down at 3 pm again:

resource "aws_autoscaling_schedule" "unstable-start" {
  scheduled_action_name  = "unstable_start"
  min_size                           = 1
  max_size                          = 1
  desired_capacity              = 1
  recurrence                        = "0 1 * * Tue"
  autoscaling_group_name = ...
}

# here we just reduce the capacity to 0 to make sure that AWS shuts down all EC2 resources 
resource "aws_autoscaling_schedule" "unstable-stop" {
  scheduled_action_name  = "unstable_stop"
  min_size                           = 0
  max_size                          = 0
  desired_capacity              = 0
  recurrence                        = "0 3 * * Tue"
  autoscaling_group_name = ...
}

@amanda11
Copy link
Author

@winem Doesn't the above terminate rather than just shutdown the instance? https://docs.aws.amazon.com/autoscaling/ec2/userguide/auto-scaling-groups.html - "and launches or terminates the instances as needed".
I think in our case we just want to start/stop it rather than create and terminate.

@arm4b
Copy link
Member

arm4b commented Jun 29, 2022

I think it's a great idea with the lambda function to start/stop the instance on a schedule 👍

@arm4b arm4b changed the title CICD Reducing costs st2CICD Reducing costs Jun 29, 2022
@amanda11
Copy link
Author

Agreed in TSC meeting for go-ahead to start/stop via lambda and keep CI server up for just a few days a week.

@amanda11
Copy link
Author

Schedule of new builds (times based on US/Pacific):

Tues @ 18:00 Orphan run (cicd repo)
Wed @ 6:00 Stable run (cd repo - no change)
Wed @ 8:00 Unstable run (ci repo)
Wed @ 18:00 Orphan run (cicd repo)
Thur @ 8:00 Unstable run (ci repo)
Thur @ 18:00 Orphan run (cicd repo)

Schedule AWS Lambda control the st2cicd EC2 instance with:

Start Tues@16:00
Stop Thurs@20:00
Therefore reducing to 52 hrs a week instead of 168.

@amanda11
Copy link
Author

amanda11 commented Jul 21, 2022

Lambda functions and rules setup. Monitoring changed to just Weds. Awaiting stop on Friday 4am UTC, and start at Tues midnight UTC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants