Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop to master - clean up OpsWorks replacement #546

Merged
merged 33 commits into from
Jul 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
399ce03
[QOLDEV-833] use bare-bones custom AMI
ThrawnCA May 8, 2024
12c8d68
Merge pull request #530 from qld-gov-au/QOLDEV-833-autoscaling-ami
ThrawnCA May 15, 2024
fb5c546
[QOLDEV-819] exclude Solr instances from power management
ThrawnCA May 17, 2024
51e9049
Merge pull request #541 from qld-gov-au/QOLDEV-819-autoscaling-instances
ThrawnCA May 20, 2024
1b3ef2c
[QOLDEV-839] drop OpsWorks resources (stack and layers) and documenta…
ThrawnCA May 21, 2024
653299f
Merge pull request #542 from qld-gov-au/QOLDEV-839-remove-opsworks
ThrawnCA May 24, 2024
14a9477
[QOLDEV-867] add autoscaling policy to target 50% CPU utilisation
ThrawnCA May 27, 2024
3c5e08d
Merge pull request #545 from qld-gov-au/QOLDEV-867-autoscaling-policy
ThrawnCA May 27, 2024
5ad191b
[QOLDEV-892] handle AWS CLI v2 syntax if present
ThrawnCA Jun 19, 2024
731b0de
[QOLDEV-892] update sandbox to Amazon Linux 2023
ThrawnCA Jun 19, 2024
7a983cd
[QOLDEV-892] install Chef client manually
ThrawnCA Jun 20, 2024
866fa0d
Merge pull request #549 from qld-gov-au/QOLDEV-892-amazon-linux-2023
ThrawnCA Jun 21, 2024
74fb300
[QOLDEV-892] update all DEV environments to Amazon Linux 2023
ThrawnCA Jun 21, 2024
4f25afb
[QOLDEV-892] retry Chef client installation if RPM database is busy
ThrawnCA Jun 24, 2024
f62fa84
Merge pull request #550 from qld-gov-au/QOLDEV-892-amazon-linux-2023
ThrawnCA Jun 24, 2024
03881ce
[QOLDEV-892] update cookbook to support Amazon Linux 2023
ThrawnCA Jun 24, 2024
0a3f28e
Merge pull request #551 from qld-gov-au/QOLDEV-892-amazon-linux-2023
ThrawnCA Jun 25, 2024
ee47826
[QOLDEV-867] don't allow deployments to push an ASG below its minimum
ThrawnCA Jun 25, 2024
ff0bac7
[QOLDEV-867] fail deployment fast if initial deployment fails
ThrawnCA Jun 25, 2024
5541e4d
[QOLDEV-867] simplify instance count configuration
ThrawnCA Jun 26, 2024
967bef5
[QOLDEV-867] capture failed deployments and clean up before exiting
ThrawnCA Jun 26, 2024
d762ba5
[QOLDEV-867] set default deployment status to success
ThrawnCA Jun 26, 2024
f4e56d5
[QOLDEV-867] add debug message when not decrementing capacity
ThrawnCA Jun 26, 2024
f4cbd23
[QOLDEV-867] fix ASG name for capacity check
ThrawnCA Jun 26, 2024
ad7b089
Merge pull request #552 from qld-gov-au/QOLDEV-867-autoscaling-policy
ThrawnCA Jun 27, 2024
c508e38
[QOLDEV-902] update Archiver to ensure QA runs on uploaded files
ThrawnCA Jun 27, 2024
e9398e7
Merge pull request #553 from qld-gov-au/QOLDEV-902-uploaded-files-qa
ThrawnCA Jun 27, 2024
8f129c3
[QOLDEV-892] update cookbook to retain support Amazon Linux 2
ThrawnCA Jun 27, 2024
fef3398
Merge pull request #554 from qld-gov-au/QOLDEV-892-amazon-linux-2023
ThrawnCA Jun 28, 2024
0a38eb6
[QOLDEV-892] skip putting instances into standby if they're not in se…
ThrawnCA Jun 28, 2024
dcf8b7a
[QOLDEV-892] update Chef client to version 18.x
ThrawnCA Jul 1, 2024
f70e685
[QOLDEV-892] extract AMI ID to a documented variable
ThrawnCA Jul 1, 2024
3600a7a
Merge pull request #555 from qld-gov-au/QOLDEV-892-amazon-linux-2023
ThrawnCA Jul 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@

Author: qol.development@smartservice.qld.gov.au

**Full one click deployment of Datashared AWS OpsWorks Stack via Ansible**
**Full one click deployment of CKAN AWS Stack via Ansible**

This stands up the www.data.qld.gov.au and www.publications.qld.gov.au aws stacks using:
* SSM
* RDS
* Redis Cluster
* OpsWorks
* EC2 Autoscaling
* Cloudfront with Lambda@Edge
* and may more features

Expand All @@ -25,7 +25,7 @@ This stands up the www.data.qld.gov.au and www.publications.qld.gov.au aws stack
For the system to work with updates to the lambda function, you must;
* first add a new version to cloudfront-lambdaAtEdge.cfn.yml which references the changed lambda function (there is no need for a new lambda function)
* export the new version from said cloudformation template
* update the cloudfront.yml ansible script to load the new version property name.
* update the cloudfront.yml ansible script to load the new version property name.
* you can delete previous versions after a successful real, do note that cloudfront will hold onto a lambda function and versions until its 'replication' finishes.

**QOL 2019 update**
Expand Down Expand Up @@ -60,7 +60,7 @@ Common issues during set up are as follows:

It's assumed that you:

* have a pretty good working knowledge of AWS, CloudFormation, OpsWorks, and CKAN and its requirements such as Solr and Postgres. Those will be necessary to troubleshoot builds when you haven't provided the correct parameters or some other obstacle gets in your way.
* have a pretty good working knowledge of AWS, CloudFormation, EC2 Autoscaling, and CKAN and its requirements such as Solr and Postgres. Those will be necessary to troubleshoot builds when you haven't provided the correct parameters or some other obstacle gets in your way.
* have built, installed and successfully run CKAN manually on some kind of single node configuration. If not, this stack isn't designed to be something to cut your teeth on. It's been designed to be relatively foolproof, but not completely so.
* know your way around the Linux command line reasonably well and know how to deal with error logs, dependency conflicts etc.

Expand Down Expand Up @@ -118,7 +118,7 @@ and automated system maintenance.

Our hope and expectation is that it benefits the wider Public Data community and progresses the Open Data ideal.

Current AWS costs for 2 CKAN applications by 4 envirionments is just shy of 3k USD a month.
Current AWS costs for 2 CKAN applications by 4 environments is just shy of 3k USD a month.

## TODO ##
Make requirements-dev look up vars/shared-${app}.var.yml and test all environment plugins
Expand Down
2 changes: 2 additions & 0 deletions build-CKAN.sh
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ run-shared-resource-playbooks () {
run-deployment () {
run-playbook "chef-json"
./chef-deploy.sh datashades::ckanweb-setup,datashades::ckanweb-deploy,datashades::ckanweb-configure $INSTANCE_NAME $ENVIRONMENT web & WEB_PID=$!
# Check if the web deployment immediately failed
kill -0 $WEB_PID
PARALLEL=1 ./chef-deploy.sh datashades::ckanbatch-setup,datashades::ckanbatch-deploy,datashades::ckanbatch-configure $INSTANCE_NAME $ENVIRONMENT batch & BATCH_PID=$!
wait $WEB_PID
wait $BATCH_PID
Expand Down
24 changes: 19 additions & 5 deletions chef-deploy.sh
Original file line number Diff line number Diff line change
Expand Up @@ -146,18 +146,32 @@ deploy () {
# double-check that instance is still running
INSTANCE_STATE=$(aws ec2 describe-instances --filters Name=instance-id,Values=$instance --query "Reservations[].Instances[0].State.Name" --output text)
if [ "$INSTANCE_STATE" != "running" ]; then continue; fi
if [ "$ASG_NAME" != "" ] && (aws autoscaling describe-auto-scaling-groups --auto-scaling-group-name $ASG_NAME --query "AutoScalingGroups[0].Instances[?InstanceId=='$instance'].InstanceId" --output text |grep "$instance" >/dev/null); then
if [ "$ASG_NAME" != "" ] && (aws autoscaling describe-auto-scaling-groups --auto-scaling-group-name $ASG_NAME --query "AutoScalingGroups[0].Instances[?InstanceId=='$instance' && LifecycleState=='InService'].InstanceId" --output text |grep "$instance" >/dev/null); then
IN_ASG="true"
# Check if the group is already at minimum capacity
CAPACITIES=$(aws autoscaling describe-auto-scaling-groups --auto-scaling-group-name $ASG_NAME --query "AutoScalingGroups[0].{min: MinSize, desired: DesiredCapacity}" --output text)
CAPACITY_1=`echo $CAPACITIES | awk '{print $1}'`
CAPACITY_2=`echo $CAPACITIES | awk '{print $2}'`
if [ "$CAPACITY_1" = "$CAPACITY_2" ]; then
debug "Capacity is at minimum ($CAPACITY_1 = $CAPACITY_2), new instance will be started"
DECREMENT_BEHAVIOUR="--no-should-decrement-desired-capacity"
else
DECREMENT_BEHAVIOUR="--should-decrement-desired-capacity"
fi
# Instances in standby will not get traffic nor health checks, allowing us to update them without interruption
OUTPUT=$(aws autoscaling enter-standby --auto-scaling-group-name "$ASG_NAME" --should-decrement-desired-capacity --instance-ids $instance --query "Activities[].Description" --output text)
OUTPUT=$(aws autoscaling enter-standby --auto-scaling-group-name "$ASG_NAME" $DECREMENT_BEHAVIOUR --instance-ids $instance --query "Activities[].Description" --output text)
debug "$OUTPUT"
elif [ "$ELB_NAME" != "" ]; then
OUTPUT=$(aws elb deregister-instances-from-load-balancer --load-balancer-name "$ELB_NAME" --instances "$instance" --query "Instances[].InstanceId" --output text)
debug "Deregistered instance $instance from load balancer $ELB_NAME, resulting registered instances: $OUTPUT"
fi
DEPLOYMENT_ID=$(aws ssm send-command --document-name "AWS-ApplyChefRecipes" --document-version "\$DEFAULT" --instance-ids $instance --parameters '{'"$CHEF_SOURCE"',"RunList":["'"$RUN_LIST"'"],"JsonAttributesSources":[""],"JsonAttributesContent":[""],"ChefClientVersion":["14"],"ChefClientArguments":[""],"WhyRun":["False"],"ComplianceSeverity":["None"],"ComplianceType":["Custom:Chef"],"ComplianceReportBucket":[""]}' --timeout-seconds 3600 --max-concurrency "50" --max-errors "0" --output-s3-bucket-name "osssio-ckan-web-logs" --output-s3-key-prefix "run_command" --region ap-southeast-2 --query "Command.CommandId" --output text)
wait_for_deployment $DEPLOYMENT_ID
DEPLOYMENT_SUCCESS=$?
if [ "$ASG_NAME" != "" ]; then
DEPLOYMENT_SUCCESS=0
wait_for_deployment $DEPLOYMENT_ID || DEPLOYMENT_SUCCESS=$?
if [ "$IN_ASG" = "true" ]; then
# reactivate the instance if we put it into standby
# NB If it was in standby before we started, then we will deploy to it
# but leave it in standby.
OUTPUT=$(aws autoscaling exit-standby --auto-scaling-group-name "$ASG_NAME" --instance-ids $instance --query "Activities[].Description" --output text)
debug "$OUTPUT"
elif [ "$ELB_NAME" != "" ]; then
Expand Down
2 changes: 1 addition & 1 deletion files/instanceSetupLambda.js
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ exports.handler = async (event) => {
SourceType: [cookbookType],
SourceInfo: [sourceInfo],
RunList: [runList],
ChefClientVersion: ["14"],
ChefClientVersion: ["None"],
WhyRun: ["False"],
ComplianceSeverity: ["None"],
ComplianceType: ["Custom:Chef"]
Expand Down
7 changes: 3 additions & 4 deletions templates/3_tier_vpc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1168,12 +1168,12 @@ Resources:
PrivateRouteTable:
Properties:
Tags:
- Key: Name
Value: !Sub "${VPCNamePrefix}Vpc-${Environment}-PrivateRoutes"
- Key: Name
Value: !Sub "${VPCNamePrefix}Vpc-${Environment}-PrivateRoutes"
VpcId:
Ref: VPC
Type: AWS::EC2::RouteTable
## private RouteTableB to E
## private RouteTableB to E
PrivateNATGatewayRouteB:
Condition: 2PlusAZsNatGateways
DependsOn: NATGatewayB
Expand Down Expand Up @@ -1584,7 +1584,6 @@ Resources:
SubnetId:
Ref: WebSubnetE
Type: AWS::EC2::SubnetRouteTableAssociation
#Allow s3 root for listing as well as get set on folders
S3Endpoint:
Type: "AWS::EC2::VPCEndpoint"
Properties:
Expand Down
6 changes: 1 addition & 5 deletions templates/Datashades-OpsWorks-CKAN-Extensions.cfn.yml.j2
Original file line number Diff line number Diff line change
@@ -1,10 +1,6 @@
---
AWSTemplateFormatVersion: '2010-09-09'
Description: |-
Creates OpsWorks Applications for CKAN Stack extensions.
Current extension list:
Legacy theme
Queensland Government extension
Description: Creates metadata needed to deploy CKAN Stack extensions.

Parameters:
Environment:
Expand Down
47 changes: 31 additions & 16 deletions templates/Datashades-OpsWorks-CKAN-Instances.cfn.yml.j2
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
AWSTemplateFormatVersion: '2010-09-09'
Description: 'Creates instances for OpsWorks CKAN NFS Stack.'
Description: 'Creates server instances for a CKAN Stack.'

Parameters:
ApplicationName:
Expand Down Expand Up @@ -113,15 +113,16 @@ Parameters:
BatchImageId:
Description: The Amazon Machine Image ID to use for launching batch instances. Defaults to Amazon Linux 2.
Type: String
Default: "ami-03b836d87d294e89e"
# Customised image based on Amazon Linux 2, preinstalling some basics
Default: "ami-0d71fe73adf7a9887"
ThrawnCA marked this conversation as resolved.
Show resolved Hide resolved
WebImageId:
Description: The Amazon Machine Image ID to use for launching web instances. Defaults to Amazon Linux 2.
Type: String
Default: "ami-03b836d87d294e89e"
Default: "ami-0d71fe73adf7a9887"
SolrImageId:
Description: The Amazon Machine Image ID to use for launching Solr instances. Defaults to Amazon Linux 2.
Type: String
Default: "ami-03b836d87d294e89e"
Default: "ami-0d71fe73adf7a9887"
DefaultEC2Key:
Description: Select an existing SSH key
Type: AWS::EC2::KeyPair::KeyName
Expand Down Expand Up @@ -166,24 +167,28 @@ Resources:
echo '/dev/sdi /mnt/local_data xfs defaults,nofail 0 2' >> /etc/fstab
mount -a
fi
if ! (yum install chef); then
for i in `seq 1 5`; do
yum install -y libxcrypt-compat "https://packages.chef.io/files/stable/chef/18.4.12/el/7/chef-18.4.12-1.el7.x86_64.rpm" && break
sleep 5
done
fi
REGION="--region ${AWS::Region}"
metadata_token=`curl -X PUT -H "X-aws-ec2-metadata-token-ttl-seconds: 60" http://169.254.169.254/latest/api/token` && \
INSTANCE_ID=$(curl -H "X-aws-ec2-metadata-token: $metadata_token" http://169.254.169.254/latest/meta-data/instance-id) && \
aws ec2 create-tags --region "${AWS::Region}" --resources $INSTANCE_ID --tags "Key=Name,Value=${ApplicationName}_${Environment}-{{ layer }}-$INSTANCE_ID"
FUNCTION_NAME=$(aws ssm get-parameter --region "${AWS::Region}" --name "/config/CKAN/${Environment}/app/${ApplicationId}/cookbook/setup_function_name" --query "Parameter.Value" --output text)
aws lambda invoke --region "${AWS::Region}" --function-name "$FUNCTION_NAME" --payload '{"EC2InstanceId": "'$INSTANCE_ID'", "phase": "setup"}' /var/log/instance-setup.log.`date '+%s'`

{% if layer == 'Web' %}
{% set minInstanceCount = (item.template_parameters['WebEC2Count'] | default('2') | int) - 1 %}
{% else %}
{% set minInstanceCount = 1 %}
{% endif %}
aws ec2 create-tags $REGION --resources $INSTANCE_ID --tags "Key=Name,Value=${ApplicationName}_${Environment}-{{ layer }}-$INSTANCE_ID"
FUNCTION_NAME=$(aws ssm get-parameter $REGION --name "/config/CKAN/${Environment}/app/${ApplicationId}/cookbook/setup_function_name" --query "Parameter.Value" --output text)
if (aws --version |grep -o 'aws-cli/[2-9]'); then
PAYLOAD_FORMAT="--cli-binary-format raw-in-base64-out"
fi
aws lambda invoke $REGION --function-name "$FUNCTION_NAME" $PAYLOAD_FORMAT --payload '{"EC2InstanceId": "'$INSTANCE_ID'", "phase": "setup"}' /var/log/instance-setup.log.`date '+%s'`

{{ layer }}ScalingGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
AutoScalingGroupName: !Sub "${Environment}-${ApplicationName}-{{ layer }}-ASG"
DesiredCapacity: !Ref {{ layer }}EC2Count
MinSize: {{ minInstanceCount }}
MinSize: !Ref {{ layer }}EC2Count
MaxSize: 6
LaunchTemplate:
LaunchTemplateId: !Ref {{ layer }}LaunchTemplate
Expand All @@ -209,7 +214,17 @@ Resources:
Value: {{ layer|lower }}
PropagateAtLaunch: true

{% if item.tags["PowerManaged"] == "Yes" %}
{{ layer }}DynamicScalingPolicy:
Type: AWS::AutoScaling::ScalingPolicy
Properties:
AutoScalingGroupName: !Ref {{ layer }}ScalingGroup
PolicyType: TargetTrackingScaling
TargetTrackingConfiguration:
PredefinedMetricSpecification:
PredefinedMetricType: ASGAverageCPUUtilization
TargetValue: 50

{% if item.tags["PowerManaged"] == "Yes" and layer != 'Solr' %}
{{ layer }}ScalingIn:
Type: AWS::AutoScaling::ScheduledAction
Properties:
Expand All @@ -224,7 +239,7 @@ Resources:
Properties:
AutoScalingGroupName: !Ref {{ layer }}ScalingGroup
DesiredCapacity: !Ref {{ layer }}EC2Count
MinSize: {{ minInstanceCount }}
MinSize: !Ref {{ layer }}EC2Count
MaxSize: 6
Recurrence: "0 20 * * *"
{% endif %}
Expand Down
Loading
Loading