Skip to content

Commit

Permalink
VOTE-2869-3058: Update documentation for devops and disaster recovery (
Browse files Browse the repository at this point in the history
  • Loading branch information
tt-gsa authored Jan 16, 2025
1 parent 733ae01 commit 456aa76
Show file tree
Hide file tree
Showing 2 changed files with 51 additions and 51 deletions.
37 changes: 31 additions & 6 deletions docs/devops.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
1. [Scripts](#scripts)
1. [Application Scripts](#application-scripts)
1. [bootstrap.sh](#bootstrapsh)
1. [build_static](#build_static)
1. [upkeep](#upkeep)
1. [bash-exports.sh](#bash-exportssh)
1. [entrypoint](#entrypoint)
1. [post-deploy](#post-deploy)
Expand All @@ -47,6 +47,8 @@
1. [scheduled-backup.sh](#scheduled-backupsh)
1. [Miscellaneous Scripts](#miscellaneous-scripts)
1. [download_latest_backup.sh](#download_latest_backupsh)
1. [Common Usage](#common-usage)
1. [Downsync Process](#downsync-process)

## Prerequisites

Expand All @@ -71,7 +73,7 @@ These are general tools and requirements needed to interact with the various scr

### Data

Obtain a copy of the latest backup archive available. This will likely have been moved to Google Drive or some other storage.
Obtain a copy of the latest backup archive available. This data is stored in Google Drive.

### Software

Expand Down Expand Up @@ -309,7 +311,7 @@ Can be used to export a variety of application variables. Not used for anything,
Runs configurations during the build pack staging. This includes things such as NewRelic configuration, setting of egress
proxy variables, and the installation of `awscli`.
#### build_static
#### upkeep
Compiles the static website via the Drupal Tome module, then syncs the newly generated static files via `awscli` to S3. This script runs as a scheduled pipeline in CircleCI, where it is launched as a CloudFoundry task in cloud.gov.
Expand All @@ -321,6 +323,10 @@ A simple script that is used to hold the container open with a infinite sleep lo
Used to do post deployment house keeping tasks. These include various `drush` commands, such as running cache rebuild, config import, and the s3fs module tasks.
#### post-deploy-upkeep
Used to do post deployment static site generation. This is the same as [upkeep](#upkeep) but runs inside the application instance and only as a part of the deploy workflow.
#### start
Used to start the PHP and Apache processes. It will then run [entrypoint](#entrypoint).
Expand Down Expand Up @@ -372,11 +378,15 @@ Installs the `mysql-client` package. This is used for [downsync-backup.sh](#down
#### downsync-backup.sh
Launched by a `triggered pipeline` in CircleCI, that will connect to the database and executes the `mysqldump` command to get a current copy of the database running in an environment.
Launched by a `triggered pipeline` in CircleCI, that will connect to the database and executes the `mysqldump` command to get a current copy of the database running in an environment. Additionally, the media files and Terraform remote state will be downloaded and saved in an archive S3 bucket.
#### prod-db-backup.sh
#### downsync-restore.sh
Launched by a `manually triggered pipeline` in CircleCI, that will connect to the prod database and executes the `mysqldump` command to get a current copy of the database. The intent with this script is to provide a mechanism to backup only the production database outside of the regularly scheduled [downsync-backup.sh](#downsync-backup.sh) if a more recent backup is needed to restore to a preprod application.
Launched by a `triggered pipeline` in CircleCI, that will connect to the database and executes the `mysql` command to restore a database running in an environment.
#### downsync-preprod.sh
Launched by a `manually triggered pipeline` in CircleCI, that will connect to the database and executes the `mysql` command to restore a database running in a preprod environment. This will retrieve the latest prod database backup from S3 and restore it into the specified environment
#### exports.sh
Expand All @@ -397,3 +407,18 @@ Launched by a `triggered pipeline` in CircleCI, this scripts gathers all S3 buck
#### download_backup.sh
Allows for easy downloading of the various backups available on the system. This includes the database backup, the Drupal user uploaded file content, and backups of the Terraform state. Help with usage is available when running the script without arguments.
## Common Usage
### Downsync Process
The Downsync Process is one by which production data is synced into a preproduction environment. This is primarily used to test application updates against production data. The [downsync-preprod.sh](#downsync-preprod) script is leveraged to do this within CircleCI through a `manually triggered pipeline`. Downsync may only happen to preproduction environments.
1. Navigate to the project in CircleCI.
1. Click `Trigger Pipeline`.
1. Select the source information cooresponding to the preproduction environment to downsync.
1. Under `Parameters` use the `Name` "restore".
1. Enter the `Value` of the preproduction environment to downsync. These may only be `test`, `dev`, `stage`.
1. The pipeline will be scheduled and retrieve the latest production database backup from the S3 backups bucket.
Files are not downsynced as a part of this pipeline because preproduction environments leverage a file proxy against prod's files.
65 changes: 20 additions & 45 deletions docs/disasterrecovery.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,8 @@ This step will create the pipeline accounts and various S3 buckets needed for th

[[top]](#votegov-disaster-recovery)

**This process must happen in the context of the [Vote.gov Terraform Repository](https://github.com/usagov/vote-gov-tf)**

1. Extract the data archive: `tar xf {terraform_archive_name}.tar.gz`. This will create a new folder called `env:/`.

![Extract Backup Archive](images/disaster_recovery/tf_restore_process_step1.png)
Expand All @@ -63,11 +65,11 @@ This step will create the pipeline accounts and various S3 buckets needed for th
- {cf_space}: The space where the bucket is deployed in Cloud.gov.

```
export bucket_name="{bucket_name}"
export cf_space="{cloudgov_space_name}"
source ./scripts/aws_creds.sh -t
```
export bucket_name="{bucket_name}"
export cf_space="{cloudgov_space_name}"
source ./scripts/aws_creds.sh -t
```

Confirm `AWS` environmental variables have exported correctly:

Expand All @@ -81,61 +83,35 @@ env | sort | grep AWS
aws s3 cp --recursive env:/ s3://${bucket}/env:/
```

4. Delete the `terraform` S3 credentials by running `aws_creds.sh` again.

![Delete Terraform State S3 Bucket Credentials](images/disaster_recovery/tf_restore_process_step5.png)
4. Apply terraform in the prod workspace.

```
source ./scripts/aws_creds.sh -t
terraform workspace select prod
terraform apply
```

5. Trigger the pipeline for the following branches. This can be accomplished by pushing to the respective branch.
1. `bootstrap`
1. `dmz`
1. `prod`
Repeat for the other environments (`dev`, `test`, `stage`) once prod is confirmed operational.

```
git checkout -b dr/restore-system
touch temp.txt
git add temp.text
git commit -m "Push to restore environment."
git push
```

2. Open a PR from `dr/restore-system` to the `prod` branch, then getting the proper merge approvals. The next items below can be skipped until `prod` is functional again.
Each environment deployment will take 5 - 10 minutes to complete, with the database instance creation taking the longest.

3. Open a PR from `prod` to the `stage` branch, then getting the proper merge approvals.

4. Open a PR from `prod` to the `dev` branch, then getting the proper merge approvals.
5. Delete the `terraform` S3 credentials by running `aws_creds.sh` again.

Each environment deployment will take 5 - 10 minutes to complete, with the database instance creation taking the longest.
![Delete Terraform State S3 Bucket Credentials](images/disaster_recovery/tf_restore_process_step5.png)

### Application Restore Process

[[top]](#votegov-disaster-recovery)

1. In the terminal, change directory to the `application` repository, trigger a deployment by pushing to the branch of the environment that needs to be restored. Change a file or add a new temporary file and commit it to the repository.
1. Navigate to the prod branch in CircleCI.

```
git checkout -b dr/restore-system
touch temp.txt
git add temp.text
git commit -m "Push to restore environment."
git push
```

2. Open a PR from `dr/restore-system` to the `prod` branch, then getting the proper merge approvals. The next items below can be skipped until `prod` is functional again.

3. Open a PR from `prod` to the `stage` branch, then getting the proper merge approvals.

4. Open a PR from `prod` to the `dev` branch, then getting the proper merge approvals.

2. Manually trigger a re-run of the most recent successful deploy workflow.

### Database Restore Process

[[top]](#votegov-disaster-recovery)

1. In a terminal window, change directory to the directory that has the database backup. The filename is `backup_{timestamp}.sql.gz`.
1. In a terminal window, change directory to the directory that has the database backup. The filename is `drupal_{timestamp}.sql.gz`.

2. Ensure that the environment has SSH enabled. The command `cf ssh {APP_NAME}` can be use to test if an SSH session can be created to the application. If it's disabled, like in `prod`, run the command below to enable it.

Expand All @@ -154,7 +130,7 @@ cf connect-to-service --no-client {APP_NAME} {DATABASE_SERVICE_NAME}
4. Using the credentials in the second window, from the command above, import the database. Depending on the database size, this process my take some time to complete.

```
gunzip < backup_{timestamp}.sql.gz | mysql --host=127.0.0.1 --port={DATABASE_PORT} --protocol=TCP --user=${DATABASE_USERNAME} -p --database=${DATABASE_NAME}
gunzip < drupal_{timestamp}.sql.gz | mysql --host=127.0.0.1 --port={DATABASE_PORT} --protocol=TCP --user=${DATABASE_USERNAME} -p --database=${DATABASE_NAME}
```

Pressing enter will prompt for the database password.
Expand All @@ -167,7 +143,7 @@ Pressing enter will prompt for the database password.
cf disable-ssh {APP_NAME}
```

7. Use the `downsync` functionality in the pipeline to migrate the database to other environments.
7. Use the `downsync` functionality in the pipeline to migrate the database to other environments.

### Media Restore Process

Expand All @@ -184,7 +160,7 @@ Media files are user uploaded files, that were uploaded via the CMS. These can b
```
export bucket_name="{bucket_name}"
export cf_space="{cloudgov_space_name}"
source ./scripts/aws_creds.sh -s
```

Expand All @@ -203,4 +179,3 @@ aws s3 cp --recursive cms/ s3://${bucket}/cms/
```

4. Repeat the steps above for other environments.

0 comments on commit 456aa76

Please sign in to comment.