Develop to master - clean up OpsWorks replacement #546

ThrawnCA · 2024-06-13T04:45:29Z

Add dynamic scaling policy to autoscaling groups
Remove all OpsWorks components

- Shared yum packages are preinstalled, 4GB swapfile has been created, Supervisord service enabled. No environment-specific changes have been made.

[QOLDEV-833] use bare-bones custom AMI

- Recreating Solr instances doesn't preserve the index. TODO Fix index sync so new instances can pick up the latest index.

[QOLDEV-819] exclude Solr instances from power management

…tion references - Can't rename CloudFormation stacks at this point, or else they will be deleted and recreated, which would be unnecessarily disruptive

[QOLDEV-839] drop OpsWorks resources (stack and layers) and documentation references

[QOLDEV-867] add autoscaling policy to target 50% CPU utilisation

- Automatic install during SSM Run Command sometimes fails without logging the reason

QOLDEV-892 update sandbox to Amazon Linux 2023

[QOLDEV-892] update all DEV environments to Amazon Linux 2023

[QOLDEV-892] update cookbook to support Amazon Linux 2023

- If desired capacity is already at minimum, attempting to put a server in standby without replacement will fail. Detect this condition and spawn a replacement.

- Set minimum and desired capacities to the same value, now that our deployments can handle that - Drop redundant config that is equivalent to the defaults

QOLDEV-867 make deployments handle autoscaling more robustly

[QOLDEV-902] update Archiver to ensure QA runs on uploaded files

[QOLDEV-892] update cookbook to retain support Amazon Linux 2

…rvice

templates/Datashades-OpsWorks-CKAN-Instances.cfn.yml.j2

duttonw · 2024-07-01T05:36:54Z

templates/Datashades-OpsWorks-CKAN-Instances.cfn.yml.j2

@@ -166,24 +166,28 @@ Resources:
                echo '/dev/sdi /mnt/local_data xfs defaults,nofail 0 2' >> /etc/fstab
                mount -a
              fi
+              if ! (yum install chef); then
+                for i in `seq 1 5`; do
+                  yum install -y libxcrypt-compat "https://packages.chef.io/files/stable/chef/14.15.6/el/7/chef-14.15.6-1.el7.x86_64.rpm" && break


can you add aws doc's or similar on which version we need to be locked to.

I actually set v14 before discovering that we needed libxcrypt-compat. It appears that we can use the latest v18 instead.

vars/instances-CKANTest.var.yml

duttonw

overall like some more comments, but should be doable.

as for increment DECREMENT_BEHAVIOUR, f it's set to increment, it should add and then wait for it to be stable before continuing the deployment. (yes it will slow the deployment by X mins but it also means 0 downtime.

- This appears to work now that we have the XCrypt compatibility library

ThrawnCA · 2024-07-01T06:30:38Z

as for increment DECREMENT_BEHAVIOUR, f it's set to increment, it should add and then wait for it to be stable before continuing the deployment. (yes it will slow the deployment by X mins but it also means 0 downtime.

Ah, that's actually taken care of already. We now set the minimum instance count equal to the desired count, instead of 1 less. So, in any environment that requests at least 2 instances - which is the default and is applicable to all production environments - there will still be at least one fully operational instance at all times.

Previously, production might have a desired count of 2, minimum of 1, and deployments would put one into Standby while deploying to it, leaving the other to carry the load.

Now, production will have both desired and minimum counts of 2, and deployments will put one into standby while launching a new one. The remaining instance will still carry the load in the meantime.

QOLDEV-892 update Chef client and improve handling of instances already in Standby

ThrawnCA and others added 30 commits May 8, 2024 15:53

[QOLDEV-833] use bare-bones custom AMI

399ce03

- Shared yum packages are preinstalled, 4GB swapfile has been created, Supervisord service enabled. No environment-specific changes have been made.

Merge pull request #530 from qld-gov-au/QOLDEV-833-autoscaling-ami

12c8d68

[QOLDEV-833] use bare-bones custom AMI

[QOLDEV-819] exclude Solr instances from power management

fb5c546

- Recreating Solr instances doesn't preserve the index. TODO Fix index sync so new instances can pick up the latest index.

Merge pull request #541 from qld-gov-au/QOLDEV-819-autoscaling-instances

51e9049

[QOLDEV-819] exclude Solr instances from power management

[QOLDEV-839] drop OpsWorks resources (stack and layers) and documenta…

1b3ef2c

…tion references - Can't rename CloudFormation stacks at this point, or else they will be deleted and recreated, which would be unnecessarily disruptive

Merge pull request #542 from qld-gov-au/QOLDEV-839-remove-opsworks

653299f

[QOLDEV-839] drop OpsWorks resources (stack and layers) and documentation references

[QOLDEV-867] add autoscaling policy to target 50% CPU utilisation

14a9477

Merge pull request #545 from qld-gov-au/QOLDEV-867-autoscaling-policy

3c5e08d

[QOLDEV-867] add autoscaling policy to target 50% CPU utilisation

[QOLDEV-892] handle AWS CLI v2 syntax if present

5ad191b

[QOLDEV-892] update sandbox to Amazon Linux 2023

731b0de

[QOLDEV-892] install Chef client manually

7a983cd

- Automatic install during SSM Run Command sometimes fails without logging the reason

Merge pull request #549 from qld-gov-au/QOLDEV-892-amazon-linux-2023

866fa0d

QOLDEV-892 update sandbox to Amazon Linux 2023

[QOLDEV-892] update all DEV environments to Amazon Linux 2023

74fb300

[QOLDEV-892] retry Chef client installation if RPM database is busy

4f25afb

Merge pull request #550 from qld-gov-au/QOLDEV-892-amazon-linux-2023

f62fa84

[QOLDEV-892] update all DEV environments to Amazon Linux 2023

[QOLDEV-892] update cookbook to support Amazon Linux 2023

03881ce

Merge pull request #551 from qld-gov-au/QOLDEV-892-amazon-linux-2023

0a3f28e

[QOLDEV-892] update cookbook to support Amazon Linux 2023

[QOLDEV-867] don't allow deployments to push an ASG below its minimum

ee47826

- If desired capacity is already at minimum, attempting to put a server in standby without replacement will fail. Detect this condition and spawn a replacement.

[QOLDEV-867] fail deployment fast if initial deployment fails

ff0bac7

[QOLDEV-867] simplify instance count configuration

5541e4d

- Set minimum and desired capacities to the same value, now that our deployments can handle that - Drop redundant config that is equivalent to the defaults

[QOLDEV-867] capture failed deployments and clean up before exiting

967bef5

[QOLDEV-867] set default deployment status to success

d762ba5

[QOLDEV-867] add debug message when not decrementing capacity

f4e56d5

[QOLDEV-867] fix ASG name for capacity check

f4cbd23

Merge pull request #552 from qld-gov-au/QOLDEV-867-autoscaling-policy

ad7b089

QOLDEV-867 make deployments handle autoscaling more robustly

[QOLDEV-902] update Archiver to ensure QA runs on uploaded files

c508e38

Merge pull request #553 from qld-gov-au/QOLDEV-902-uploaded-files-qa

e9398e7

[QOLDEV-902] update Archiver to ensure QA runs on uploaded files

[QOLDEV-892] update cookbook to retain support Amazon Linux 2

8f129c3

Merge pull request #554 from qld-gov-au/QOLDEV-892-amazon-linux-2023

fef3398

[QOLDEV-892] update cookbook to retain support Amazon Linux 2

[QOLDEV-892] skip putting instances into standby if they're not in se…

0a38eb6

…rvice

duttonw reviewed Jul 1, 2024

View reviewed changes

templates/Datashades-OpsWorks-CKAN-Instances.cfn.yml.j2 Show resolved Hide resolved

duttonw reviewed Jul 1, 2024

View reviewed changes

vars/instances-CKANTest.var.yml Outdated Show resolved Hide resolved

duttonw approved these changes Jul 1, 2024

View reviewed changes

ThrawnCA added 2 commits July 1, 2024 16:13

[QOLDEV-892] update Chef client to version 18.x

dcf8b7a

- This appears to work now that we have the XCrypt compatibility library

[QOLDEV-892] extract AMI ID to a documented variable

f70e685

Merge pull request #555 from qld-gov-au/QOLDEV-892-amazon-linux-2023

3600a7a

QOLDEV-892 update Chef client and improve handling of instances already in Standby

ThrawnCA merged commit 6d21ed3 into master Jul 2, 2024
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Develop to master - clean up OpsWorks replacement #546

Develop to master - clean up OpsWorks replacement #546

ThrawnCA commented Jun 13, 2024

duttonw Jul 1, 2024

ThrawnCA Jul 1, 2024

duttonw left a comment

ThrawnCA commented Jul 1, 2024

Develop to master - clean up OpsWorks replacement #546

Develop to master - clean up OpsWorks replacement #546

Conversation

ThrawnCA commented Jun 13, 2024

duttonw Jul 1, 2024

Choose a reason for hiding this comment

ThrawnCA Jul 1, 2024

Choose a reason for hiding this comment

duttonw left a comment

Choose a reason for hiding this comment

ThrawnCA commented Jul 1, 2024