Skip to content
This repository has been archived by the owner on Jun 23, 2021. It is now read-only.

Serverless Operations

Lu Hong edited this page Oct 11, 2019 · 35 revisions

Operations plays a vital role in a production service. In this project, we have captured how we, at AWS, implement operational best practices, such as setting up alarms, dashboards, and CI/CD pipelines. When you deploy the project following the Quickstart guide, the operations component will be deployed automatically in your AWS account. We will leverage Amazon Cloudwatch for alarms and dashboards, AWS CodeBuild for CI and AWS CodePipeline for CD. You can follow the steps laid out in this walkthrough to learn more.

Alarms

To view the alarms setup by the project deployment:

  1. Login to your AWS account that you deployed the project to. If you have not deployed the project, check Quick Start.
  2. Go to CloudWatch console and click "Alarms".
  3. Search for "realworld-serverless-application-ops-Alarm" and you will see four alarms: Api4xxErrors, ApiAvailability, ApiLatencyP90 and ApiLatencyP50. The alarms monitor for the most critical operational problems: error rate, availability and latency.
  4. Click on one of the alarms and you will see the detail page for each alarm.
    1. Api4xxErrors: it alarms when 4xx error rate is above 0.3 for 5 mins. 4xx is treated as expected during normal application running while a large percentage of 4xx can also denote a bad version deployment.
    2. ApiAvailability: it alarms when 5xx error rate is above 0.1 for 5 mins. 5xx is not expected and indicates server error.
    3. ApiLatencyP90: it alarms when latency P90 is above 2000 for 5 mins. It indicates a latency spike and notifies you when customers are experience outstanding high latency.
    4. ApiLatencyP50: it alarms when latency P50 is above 200 for 5 mins. It monitors the average latency for the most slowest 50 percent of requests and indicates the general experience of customers.

Here is an example of alarms:

Dashboard

If the alarms go into "ALARM" state, messages will be sent to an Amazon SNS topic, called "AlarmsTopic" and prefixed with "realworld-serverless-application-ops-Alarm". To receive notifications from the topic, you can subscribe to the topic:

  1. Go to Amazon SNS console, click "Topics" and choose the topic "realworld-serverless-application-ops-Alarm-xxx-AlarmsTopic".
  2. On the detail page, click "Create subscription" and use the desired protocol and endpoint. For example, if you want to receive email notification, you can choose "Email" under "Protocol" and put your email address under "Endpoint".

Here is an example email notification:

Dashboard

Dashboard

  1. Go to CloudWatch console and click "Dashboards".
  2. Click on the name "Dashboard-xxx" and you will see the dashboard

The dashboard is composed of three parts: API Gateway metrics, API Lambda metrics and CloudWatch Insights queries.

Here is an example dashboard:

Dashboard

API Gateway metrics

API Gateway metrics provide a view of API usage and health. Metrics include 5XX error count and availability, request count, 4XX error count, and latency.

API Lambda metrics

API Lambda metrics measure usage and health of the lambda function. Metrics include error count and success rate, invocations, and latency.

CloudWatch Insights queries

CloudWatch Insights provides insights to API performance and how customers are impacted by performance issues. Queries include "Top 10 customers by Request Count", "Top 10 Customers Impacted by API 5xx", "Top 10 API 5xx Errors", "Top 10 API 4xx Errors" and "Top 10 API Latency Requests".

"Top 10 customers by Request Count" and "Top 10 Customers Impacted by API 5xx" respectively show customers who are most actively using the service and who are most impacted by service issues.

"Top 10 API 5xx Errors", "Top 10 API 4xx Errors" provides the error messages or error types.

"Top 10 API Latency Requests" allows you to understand which types of requests are experiencing highest latencies.

CI (Continuous Integration)

Serverless application aws-sam-codebuild-ci is used to setup CI for this GitHub repository using AWS CodeBuild. You can follow the steps in the readme of aws-sam-codebuild-ci to setup CI for your GitHub repository or your fork of realworld-serverless-application repository.

aws-sam-codebuild-ci application creates an AWS CodeBuild project. The project runs the build commands specified in buildspec.yaml that compile and run unit tests for the project when a PR is created or when a commit is pushed to master branch.

Branch rule is set on master branch that a PR can't be merged until the AWS CodeBuild project passes. This ensures the quality of the code submitted in the PR.

You can also see the AWS CodeBuild Badge on the README file that shows the build status of master branch. This helps us ensure the quality of the code that is committed to master branch.

CD (Continuous Deployment)

Serverless application aws-sam-codepipeline-cd is used to setup CD for this GitHub repository using AWS CodePipeline. See the readme of aws-sam-codepipeline-cd for details on how to use it.

In this repository, aws-sam-codepipeline-cd is used as a nested application in cicd template for each component. Each component has its own CD pipeline for deployment. Changes in a component only deploy in the component's pipeline and other components won't need a deployment. This can reduce the impact of a production deployment and also make it easier to deploy changes into multiple components independently.

CICD template for each component:

  • Backend
  • Website
  • Ops
  • Analytics

You can deploy the CICD template into your account to setup CD for your fork of realworld-serverless-application. Here are the steps to deploy the CICD template into your account:

Clone this wiki locally