Keeping track of your instances in AWS is quite a task. This tool (zumoco, for zulily monitoring collector), run routinely from AWS Lambda, will detect your instances, add CloudWatch alerting for each, and create CloudWatch dashboards based on your preferences.
There are a few things you need to do to set this up for yourself (see details below):
- Create the AWS Simple Notification Service (SNS) topics/subscriptions for notifying teams based on alert severity.
- Create the AWS S3 bucket to store the instance history, per service.
- Customize the JSON templates in the
monitordefs
directory to match the services you want to monitor. - Package and deploy the lambda function to AWS.
AWS uses SNS to handle notifications of AWS CloudWatch alerts. zumoco uses the Amazon Resource Name (ARN) for a given SNS topic (connected to a notification endpoint) when creating a CloudWatch alert. Various SNS topic/subscriptions can be created as follows:
- Create a PagerDuty integration for SNS following the process here: link (ARN is available following Step 4. of the "AWS SNS Console" section.)
- Create a Slack integration for SNS by creating another AWS Lambda function, which pushes SNS events to the Slack Chat server. The description of the AWS Lambda template is found here: link.
- Create a hipchat integration by creating another AWS Lambda function, as provided here: link. Note: the
tests/zumoco_test.py
test file requires an SNS topic to create alarms; replace the appropriateAlarmDestinations
values with your SNS topic in order to run the tests. - Create an email integration, using the AWS process here: link
This lambda function uses an S3 bucket to store the history of instances, in order to only add/delete alarms for a specific instance. Create an S3 bucket with prefix named to match the value used in the s3_access.json
file.
- Edit
team.json
to modify:
Team
: Set to your team's name.CreateTeamDashboard
: Set to false if you don't want a dashboard set with all metric alarms.Bucket
: Set to your S3 bucket name.MonitorDefs
list : Change to reference only the services' files on which you plan to alert.
- Copy each service you want to monitor (e.g.,
ec2_TeamFoo.json
) to a new filename (referencing it in theteam.json
MonitorDefs
list.) In the new file, modify:
S3Suffix
: Make this unique per service file (so if you have two ec2 service files, make sure this value is changed).ReportARN
: Set this to the SNS ARN used for receiving the service report, generated for each service on each run giving total/added/deleted service instances.InstanceFilters
: If your instances have names, addtag:Name
key's values as you likely want to restrict monitoring (and dashboard generation) to a subset of instances (less than 1k total, per AWS API docs). (If you do this, you also should to change theAlarmPrefix
to keep dashboards, etc., separate.) Example:"Filters=[{'Name':'tag:Name', 'Values':['hadoop*']},{'Name':'instance-state-name', 'Values':['running']}]"
AlarmDestinations
: Modify to include all SNS topic/subscription alarm destinations you created in the previous section.CreateServiceDashboard
: Set to false if you don't want a dashboard set with all metric alarms for the given service.AlarmPrefix
: Use this string to name all (filtered) instance alerts of the given service.Alarms
section: (Add/remove/change metrics in this dictionary as necessary).AlarmAction
: Set this to the appropriateAlarmDestination
name.send_ok
: Set this boolean tofalse
if you don't wantOKAction
messages sent to the sameAlarmDestination
as theAlarmAction
.Period
: Adjust based on basic/detailed monitoring, etc.Threshold
: Adjust based on preferences.Charts
section: (Add/remove/change charts in this dictionary as necessary).ch_type
: "Metric" is currently supported for auto-generation.is_alarm
: Boolean determining whether chart is an alarm chart (requiring ARN from an alarm) or a metrics chart without alarm values shown.avail
: String used to select availability zone in instance JSON (varies by service instance).metric_list
: List of metrics or alarm to be charted, following charts definition.view
: Note:singleValue
charts are half the width oftimeSeries
charts.
TagsKey
section makes charts/alarms easier to read by adding/substituting tags for instance id:FriendlyName
: Set to tag key used as name for the given instance, ornull
to only use instance id.EnsureUniqueName
: Set totrue
to append instance id to FriendlyName to ensure uniqueness. This is useful for EC2 when you are using autoscaling groups that have the sameName
value, etc.
The packaging step will deploy everything in the monitordefs directory to the Lambda zip file (so you may wish to remove templates/files you don't use).
- Edit
vars.sh
to set the rate for the lambda function to run inside your VPC. - Run
./deploy_lambda_function.sh
. This will:
- Package up the script with its dependencies into the zip format that AWS Lambda expects (as defined in
package.sh
). - Interact with the AWS API to set up the lambda function with the things it needs (as defined in
deployscripts/setup_lambda.py
):- Creates an IAM role for the lambda function to use. Review the json files in the
deployscripts
directory to see the permissions required. - Uploads the zip file from the previous step to create a Lambda function (possibly publishing a new version if the function already exists).
- Creates an IAM role for the lambda function to use. Review the json files in the