Skip to content

How to Evaluate "EC2 Startup Failure" Alerts

Tim Ellis edited this page Sep 4, 2024 · 4 revisions

Pre-Requisites

  • Access to CMS VPN
  • Access to BFD/CMS AWS account(s)
  • Installation of AWS CLI, properly configured for access to BFD/CMS AWS account
  • Installation of jq, sed, awk

Instructions

Determine which instances have failed and whether still available within AWS EC2 environment:

  • Run the following command sequence in bash or zsh after connecting to CMS VPN

aws logs filter-log-events --log-group-name /aws/ec2/var/log/cloud-init-output.log --filter-pattern "%failed=[1-9]%" --start-time $(( $(( $(/bin/date +%s) - 3600 )) * 1000 )) | jq '[ .events[] | { logStreamName, message }]' | jq '.[].logStreamName' | /usr/bin/sed 's/"//g' | /usr/bin/awk -F'.' '{ print $1; }' | /usr/bin/sed 's/-/./g' | /usr/bin/awk -F'.' '{printf "%d.%d.%d.%d\n",$2,$3,$4,$5;}' | while read ipaddr; do echo "$ipaddr"; aws ec2 describe-instances --filters "Name=network-interface.addresses.private-ip-address,Values=$ipaddr"; done;

Note: The above command queries log group /aws/ec2/var/log/cloud-init-output.log for the past hour ((/bin/date +%s) - 3600) for specified filter pattern that matches the associated Alert pattern. That result set interpolates from the log stream name to the private IPv4 address of the EC2 instance and issues a corresponding describe-instances query. If the JSON response is nothing more than "Reservations[]" then the instance was automatically terminated.

Investigation

  1. Review the Cloudwatch log stream associated with each specified Private IPv4 address in order to determine which AMI profile was leveraged for deployment.
  2. Evaluate the server profile startup to determine if the failure is general or environment-specific.
  3. Proceed with further detailed investigation as warranted.
Clone this wiki locally