Skip to content

Latest commit

 

History

History
170 lines (118 loc) · 10.2 KB

README-QWIKLABS.md

File metadata and controls

170 lines (118 loc) · 10.2 KB

Monitoring with Stackdriver on Kubernetes Engine

Table of Contents

Introduction

Stackdriver Kubernetes Monitoring is a new Stackdriver feature that more tightly integrates with GKE to better show you key stats about your cluster and the workloads and services running in it. Included in the new feature is functionality to import, as native Stackdriver metrics, metrics from pods with Prometheus endpoints. This allows you to use Stackdriver native alerting functionality with your Prometheus metrics without any additional workload.

This tutorial will walk you through setting up Monitoring and visualizing metrics from a Kubernetes Engine cluster. It makes use of Terraform, a declarative Infrastructure as Code tool that enables configuration files to be used to automate the deployment and evolution of infrastructure in the cloud. The logs from the Kubernetes Engine cluster will be leveraged to walk through the monitoring capabilities of Stackdriver.

Note: The setup of the Stackdriver Monitoring workspace is not automated with a script because it is currently not supported through Terraform or via the gcloud command line tool.

Architecture

The tutorial will create a Kubernetes Engine cluster that has a sample application deployed to it. The logging and metrics for the cluster are loaded into Stackdriver Logging by default. In the tutorial a Stackdriver Monitoring account will be setup to view the metrics captured.

Monitoring Architecture

Initial Setup

Configure gcloud

All the tools for the demo are installed. When using Cloud Shell execute the following command in order to setup gcloud cli. When executing this command please setup your region and zone.

gcloud init

Tools

  1. Terraform >= 0.11.7
  2. Google Cloud SDK version >= 204.0.0
  3. kubectl matching the latest GKE version

You can obtain a free trial of GCP if you need one

Install Cloud SDK

The Google Cloud SDK is used to interact with your GCP resources. Installation instructions for multiple platforms are available online.

Install kubectl CLI

The kubectl CLI is used to interteract with both Kubernetes Engine and kubernetes in general. Installation instructions for multiple platforms are available online.

Install Terraform

Terraform is used to automate the manipulation of cloud infrastructure. Its installation instructions are also available online.

Configure Authentication

The Terraform configuration will execute against your GCP environment and create a Kubernetes Engine cluster running a simple application. The configuration will use your personal account to build out these resources. To setup the default account the configuration will use, run the following command to select the appropriate account:

$ gcloud auth application-default login

Deployment

In this section we will create a Stackdriver Monitoring account so that we can explore the capabilities of the Monitoring console.

Create a new Stackdriver Account

The following steps are used to setup a Stackdriver Monitoring account.

  1. Visit the Monitoring section of the GCP Console. This will launch the process of creating a new Monitoring console if you have not created one before.
  2. On the Create your free StackDriver account page select the project you created earlier. Note: You cannot change this setting once it is created.
  3. Click on the Create Account button.
  4. On the next page, Add Google Cloud Platform projects to monitor you can leave this alone since the project is already selected it isn't necessary to select any other projects. Note: You can add and remove projects at a later date if necessary.
  5. Click the Continue button.
  6. On the Monitor AWS accounts page you can choose to specify your AWS account information or skip this step.
  7. For this tutorials purposes you can click the Skip AWS Setup button.
  8. On the Install the Stackdriver Agents page you are provided with a script that can be used to add the Stackdriver Monitoring and Logging agents on each of your VM instances. Note: The tracking of VM's is not automatic like it is for Kubernetes Engine. For the purposes of this tutorial this script is not needed.
  9. Click the Continue button.
  10. On the Get Reports by Email page you can simply select any of the options depending on whether you want to receive the reports. For the purposes of this demo we will not be using the reports.
  11. Click the Continue button.
  12. The actual creation of the account and underlying resources takes a few minutes. Once completed you can press the Launch monitoring button.

Deploying the cluster

The infrastructure and Stackdriver alert policy required by this project can be deployed by executing:

make create

This will:

  1. Read your project & zone configuration to generate a couple config files:
  • ./terraform/terraform.tfvars for Terraform variables
  • ./manifests/prometheus-service-sed.yaml for the Prometeus policy to be created in Stackdriver
  1. Run terraform init to prepare Terraform to create the infrastructure
  2. Run terraform apply to actually create the infrastructure & Stackdriver alert policy

If you need to override any of the defaults in the Terraform variables file, simply replace the desired value(s) to the right of the equals sign(s). Be sure your replacement values are still double-quoted.

If no errors are displayed then after a few minutes you should see your Kubernetes Engine cluster in the GCP Console.

How does Terraform work?

Following the principles of Infrastructure as Code and Immutable Infrastructure, Terraform supports the writing of declarative descriptions of the desired state of infrastructure. When the descriptor is applied, Terraform uses GCP APIs to provision and update resources to match. Terraform compares the desired state with the current state so incremental changes can be made without deleting everything and starting over. For instance, Terraform can build out GCP projects and compute instances, etc., even set up a Kubernetes Engine cluster and deploy applications to it. When requirements change, the descriptor can be updated and Terraform will adjust the cloud infrastructure accordingly.

This example will start up a Kubernetes Engine cluster and deploy a simple sample application to it. By default, Kubernetes Engine clusters in GCP are provisioned with a pre-configured Fluentd-based collector that forwards logs to Stackdriver.

Validation

If no errors are displayed during deployment, after a few minutes you should see your Kubernetes Engine cluster in the GCP Console with the sample application deployed.

In order to validate that resources are installed and working correctly, run:

make validate

Using Stackdriver Kubernetes Monitoring

For a thorough guide on how to observe your cluster with the new Stackdriver Kubernetes UI, see Observing Your Kubernetes Clusters.

Native Prometheus integration

The Terraform code included a Stackdriver alerting policy that is watching a metric that was originally imported from a Prometheus endpoint. From the Stackdriver main page, click on Alerting then Policies Overview to show all the policies, including the alerting policy called Prometheus mem alloc. Clicking on the policy will provide much more detail.

Teardown

When you are finished with this example, and you are ready to clean up the resources that were created so that you avoid accruing charges, you can run the following command to remove all resources :

$ make teardown

This command uses the terraform destroy command to remove the infrastructure. Terraform tracks the resources it creates so it is able to tear them all back down.

Troubleshooting

** The install script fails with a Permission denied when running Terraform.** The credentials that Terraform is using do not provide the necessary permissions to create resources in the selected projects. Ensure that the account listed in gcloud config list has necessary permissions to create resources. If it does, regenerate the application default credentials using gcloud auth application-default login.

** Metrics Not Appearing or Uptime Checks not executing ** After the scripts execute it may take a few minutes for the Metrics or Uptime Checks to appear. Configure the items and give the system some time to generate metrics and checks as they someimes take time to complete.

Relevant Material

This is not an officially supported Google product