Skip to content

Latest commit

 

History

History
557 lines (417 loc) · 20.1 KB

README.md

File metadata and controls

557 lines (417 loc) · 20.1 KB

Autoscaler tool for Cloud Spanner

Autoscaler

Set up the Autoscaler using Terraform configuration files
Home · Scaler component · Poller component · Forwarder component · Terraform configuration · Monitoring
Cloud Functions · Google Kubernetes Engine

Table of Contents

Overview

This directory contains Terraform configuration files to quickly set up the infrastructure for your Autoscaler for a deployment to Google Kubernetes Engine (GKE).

In this deployment option, all the components of the Autoscaler reside in the same project as your Spanner instances. A future enhancement may enable the autoscaler to operate cross-project when running in GKE.

This deployment is ideal for independent teams who want to self-manage the infrastructure and configuration of their own Autoscalers on Kubernetes.

Architecture

architecture-gke

  1. Using a Kubernetes ConfigMap you define which Spanner instances you would like to be managed by the autoscaler. Currently these must be in the same project as the cluster that runs the autoscaler.

  2. Using a Kubernetes CronJob, the autoscaler is configured to run on a schedule. By default this is every minute, though this is configurable.

  3. When scheduled, an instance of the Poller is created as a Kubernetes Job.

  4. The Poller queries the Cloud Monitoring API to retrieve the utilization metrics for each Spanner instance.

  5. For each Spanner instance, the Poller makes a call to the Scaler via its API. The request payload contains the utilization metrics for the specific Spanner instance, and some of its corresponding configuration parameters.

  6. Using the chosen scaling method, the Scaler compares the Spanner instance metrics against the recommended thresholds, plus or minus an allowed margin and determines if the instance should be scaled, and the number of nodes or processing units that it should be scaled to.

  7. The Scaler retrieves the time when the instance was last scaled from the state data stored in Cloud Firestore (or alternatively Spanner) and compares it with the current time.

  8. If the configured cooldown period has passed, then the Scaler requests the Spanner Instance to scale out or in.

The GKE deployment has the following pros and cons:

Pros

  • Kubernetes-based: For teams that may not be able to use Google Cloud services such as Cloud Functions, this design enables the use of the autoscaler.
  • Configuration: The control over scheduler parameters belongs to the team that owns the Spanner instance, therefore the team has the highest degree of freedom to adapt the Autoscaler to its needs.
  • Infrastructure: This design establishes a clear boundary of responsibility and security over the Autoscaler infrastructure because the team owner of the Spanner instances is also the owner of the Autoscaler infrastructure.

Cons

  • Infrastructure: In contrast to the Cloud Functions design, some long-lived infrastructure and services are required.
  • Maintenance: with each team being responsible for the Autoscaler configuration and infrastructure it may become difficult to make sure that all Autoscalers across the company follow the same update guidelines.
  • Audit: because of the high level of control by each team, a centralized audit may become more complex.

Before you begin

In this section you prepare your environment.

  1. Open the Cloud Console

  2. Activate Cloud Shell
    At the bottom of the Cloud Console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Cloud SDK already installed, including the gcloud command-line tool, and with values already set for your current project. It can take a few seconds for the session to initialize.

  3. In Cloud Shell, clone this repository:

    git clone https://github.com/cloudspannerecosystem/autoscaler.git
  4. Export variables for the working directories:

    export AUTOSCALER_ROOT="$(pwd)/autoscaler"
    export AUTOSCALER_DIR=${AUTOSCALER_ROOT}/terraform/gke

Preparing the Autoscaler Project

In this section you prepare your project for deployment.

  1. Go to the project selector page in the Cloud Console. Select or create a Cloud project.

  2. Make sure that billing is enabled for your Google Cloud project. Learn how to confirm billing is enabled for your project.

  3. In Cloud Shell, configure the environment with the ID of your autoscaler project:

    export PROJECT_ID=<INSERT_YOUR_PROJECT_ID>
    gcloud config set project ${PROJECT_ID}
  4. Set the region where the Autoscaler resources will be created:

    export REGION=us-central1
  5. Enable the required Cloud APIs:

    gcloud services enable iam.googleapis.com \
      artifactregistry.googleapis.com \
      cloudbuild.googleapis.com \
      cloudresourcemanager.googleapis.com \
      container.googleapis.com \
      spanner.googleapis.com
  6. If you want to create a new Spanner instance for testing the Autoscaler, set the following variable. The Spanner instance that Terraform creates is named autoscale-test.

    export TF_VAR_terraform_spanner_test=true

    On the other hand, if you do not want to create a new Spanner instance because you already have an instance for the Autoscaler to monitor, set the name name of your instance in the following variable

    export TF_VAR_spanner_name=<INSERT_YOUR_SPANNER_INSTANCE_NAME>

    For more information on how to configure your Spanner instance to be managed by Terraform, see Importing your Spanner instances

  7. There are two options for deploying the state store for the Autoscaler:

    1. Store the state in Firestore
    2. Store the state in Spanner

    For Firestore, follow the steps in Using Firestore for Autoscaler State. For Spanner, follow the steps in Using Spanner for Autoscaler state.

Using Firestore for Autoscaler state

  1. To use Firestore for the Autoscaler state, choose the App Engine Location where the Autoscaler infrastructure will be created, for example:

    export APP_ENGINE_LOCATION=us-central
  2. Enable the additional APIs:

    gcloud services enable \
      appengine.googleapis.com \
      firestore.googleapis.com
  3. Create a Google App Engine app to enable the API for Firestore:

    gcloud app create --region="${APP_ENGINE_LOCATION}"
  4. To store the state of the Autoscaler, update the database created with the Google App Engine app to use Firestore native mode.

    gcloud firestore databases update --type=firestore-native

    You will also need to make a minor modification to the Autoscaler configuration. The required steps to do this are later in these instructions.

  5. Next, continue to Deploying the Autoscaler

Using Spanner for Autoscaler state

  1. If you want to store the state in Cloud Spanner and you don't have a Spanner instance yet for that, then set the following variable so that Terraform creates an instance for you named autoscale-test-state:

    export TF_VAR_terraform_spanner_state=true

    It is a best practice not to store the Autoscaler state in the same instance that is being monitored by the Autoscaler.

    Optionally, you can change the name of the instance that Terraform will create:

    export TF_VAR_spanner_state_name=<INSERT_STATE_SPANNER_INSTANCE_NAME>

    If you already have a Spanner instance where state must be stored, only set the the name of your instance:

    export TF_VAR_spanner_state_name=<INSERT_YOUR_STATE_SPANNER_INSTANCE_NAME>

    If you want to manage the state of the Autoscaler in your own Cloud Spanner instance, please create the following table in advance:

    CREATE TABLE spannerAutoscaler (
       id STRING(MAX),
       lastScalingTimestamp TIMESTAMP,
       createdOn TIMESTAMP,
       updatedOn TIMESTAMP,
    ) PRIMARY KEY (id)
  2. Next, continue to Deploying the Autoscaler

Deploying the Autoscaler

  1. Set the project ID and region in the corresponding Terraform environment variables:

    export TF_VAR_project_id=${PROJECT_ID}
    export TF_VAR_region=${REGION}
  2. Change directory into the Terraform per-project directory and initialize it:

    cd ${AUTOSCALER_DIR}
    terraform init
  3. Create the Autoscaler infrastructure:

    terraform plan -out=terraform.tfplan
    terraform apply -auto-approve terraform.tfplan

If you are running this command in Cloud Shell and encounter errors of the form "Error: cannot assign requested address", this is a known issue in the Terraform Google provider, please retry with -parallelism=1.

Next, continue to Building and Deploying the Autoscaler Services.

Importing your Spanner instances

If you have existing Spanner instances that you want to import to be managed by Terraform, follow the instructions in this section.

  1. List your spanner instances

    gcloud spanner instances list
  2. Set the following variable with the instance name to import

    SPANNER_INSTANCE_NAME=<YOUR_SPANNER_INSTANCE_NAME>
  3. Create a Terraform config file with an empty google_spanner_instance resource

    echo "resource \"google_spanner_instance\" \"${SPANNER_INSTANCE_NAME}\" {}" > "${SPANNER_INSTANCE_NAME}.tf"
  4. Import the Spanner instance into the Terraform state.

    terraform import "google_spanner_instance.${SPANNER_INSTANCE_NAME}" "${SPANNER_INSTANCE_NAME}"
  5. After the import succeeds, update the Terraform config file for your instance with the actual instance attributes

    terraform state show -no-color "google_spanner_instance.${SPANNER_INSTANCE_NAME}" \
      | grep -vE "(id|num_nodes|state|timeouts).*(=|\{)" \
      > "${SPANNER_INSTANCE_NAME}.tf"

If you have additional Spanner instances to import, repeat this process.

Importing Spanner databases is also possible using the google_spanner_database resource and following a similar process.

Building and Deploying the Autoscaler Services

  1. To build the Autoscaler images and push them to Artifact Registry, run the following commands:

    cd ${AUTOSCALER_ROOT} && \
    gcloud builds submit poller --config=poller/cloudbuild.yaml --region=${REGION} && \
    gcloud builds submit scaler --config=scaler/cloudbuild.yaml --region=${REGION}
  2. Construct the paths to the images:

    POLLER_PATH="${REGION}-docker.pkg.dev/${PROJECT_ID}/spanner-autoscaler/poller"
    SCALER_PATH="${REGION}-docker.pkg.dev/${PROJECT_ID}/spanner-autoscaler/scaler"
  3. Retrieve the SHA256 hashes of the images:

    POLLER_SHA=$(gcloud artifacts docker images describe ${POLLER_PATH}:latest --format='value(image_summary.digest)')
    SCALER_SHA=$(gcloud artifacts docker images describe ${SCALER_PATH}:latest --format='value(image_summary.digest)')
  4. Construct the full paths to the images, including the SHA256 hashes:

    POLLER_IMAGE="${POLLER_PATH}@${POLLER_SHA}"
    SCALER_IMAGE="${SCALER_PATH}@${SCALER_SHA}"
  5. Retrieve the credentials for the cluster where the Autoscaler will be deployed:

    gcloud container clusters get-credentials spanner-autoscaler --region=${REGION}
  6. Next, to configure the Kubernetes manifests and deploy the Autoscaler to the cluster, run the following commands:

    cd ${AUTOSCALER_ROOT}/kubernetes && \
    kpt fn eval --image gcr.io/kpt-fn/apply-setters:v0.1.1 autoscaler-pkg -- poller_image=${POLLER_IMAGE} scaler_image=${SCALER_IMAGE} && \
    kubectl apply -f autoscaler-pkg/ --recursive

    The sample configuration creates two schedules to demonstrate autoscaling; a frequently running schedule to dynamically scale the Spanner instance according to utilization, and an hourly schedule to directly scale the Spanner instance every hour.

  7. To prepare to configure the Autoscaler, run the following command:

    for template in $(ls autoscaler-config/*.template) ; do envsubst < ${template} > ${template%.*} ; done
  8. Next, to see how the Autoscaler is configured, run the following command to output the example configuration:

    cat autoscaler-config/autoscaler-config*.yaml

    These two files configure each instance of the autoscaler that you scheduled in the previous step. Notice the environment variable AUTOSCALER_CONFIG. You can use this variable to reference a configuration that will be used by that individual instance of the autoscaler. This means that you can configure multiple scaling schedules across multiple Spanner instances.

    If you do not supply this value, a default of autoscaler-config.yaml will be used.

    You can autoscale multiple Spanner instances on a single schedule by including multiple YAML stanzas in any of the scheduled configurations. For the schema of the configuration, see the [Poller configuration] autoscaler-config-params section.

  9. If you have chosen to use Firestore to hold the Autoscaler state as described above, edit the above files, and remove the following lines:

     stateDatabase:
       name: spanner
       instanceId: autoscale-test
       databaseId: spanner-autoscaler-state

    Note: If you do not remove these lines, the Autoscaler will attempt to use the above non-existent Spanner database for its state store, which will result in the Poller component failing to start. Please see the Troubleshooting section for more details.

    If you have chosen to use your own Spanner instance, please edit the above configuration files accordingly.

  10. To configure the Autoscaler and begin scaling operations, run the following command:

    kubectl apply -f autoscaler-config/
  11. Any changes made to the configuration files and applied with kubectl apply will update the Autoscaler configuration.

  12. You can view logs for the Autoscaler components via kubectl or the Cloud Logging interface in the Google Cloud console.

Troubleshooting

This section contains guidance on what to do if you encounter issues when following the instructions above.

If the GKE cluster is not successfully created

  1. Check there are no Organizational Policy rules that may conflict with cluster creation.

If you do not see scaling operations as expected

  1. The first step if you are encountering scaling issues is to check the logs for the Autoscaler in Cloud Logging. To retrieve the logs for the Poller and Scaler components, use the following query:

    resource.type="k8s_container"
    resource.labels.namespace_name="spanner-autoscaler"
    resource.labels.container_name="poller" OR resource.labels.container_name="scaler"
    

    If you do not see any log entries, check that you have selected the correct time period to display in the Cloud Logging console, and that the GKE cluster nodes have the correct permissions to write logs to the Cloud Logging API (roles/logging.logWriter).

If the Poller fails to run successfully

  1. If you have chosen to use Firestore for Autoscaler state and you see the following error in the logs:

     Error: 5 NOT_FOUND: Database not found: projects/<YOUR_PROJECT>/instances/autoscale-test/databases/spanner-autoscaler-state

    Edit the file ${AUTOSCALER_ROOT}/autoscaler-config/autoscaler-config.yaml and remove the following stanza:

     stateDatabase:
       name: spanner
       instanceId: autoscale-test
       databaseId: spanner-autoscaler-state
  2. Check the formatting of the YAML configration file:

    cat ${AUTOSCALER_ROOT}/autoscaler-config/autoscaler-config.yaml