-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CUMULUS-3671]: Update docs for Serverless V2 #3666
Changes from 3 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,9 +12,9 @@ core deployment code expects this database to be provided by the [AWS RDS](https | |
|
||
RDS databases are broadly divided into two types: | ||
|
||
- **Provisioned**: Databases with a fixed capacity in terms of CPU and memory capacity. You can find | ||
a list of the available database instance sizes in [this AWS documentation](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.DBInstanceClass.html). | ||
- **Serverless**: Databases that can scale their CPU and memory capacity up and down in response to database load. [Amazon Aurora](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/CHAP_AuroraOverview.html) is the service which provides serverless RDS databases. | ||
- **Provisioned**: Databases with a fixed capacity in terms of CPU and memory capacity. e.g. Memory optimized, Burstable, and Optimized Reads. You can find | ||
a complete list of the available database instance class types in [this AWS documentation](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.DBInstanceClass.html). | ||
- **Serverless v2**: Databases that can scale their CPU and memory capacity up and down in response to database load. [Amazon Aurora](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/CHAP_AuroraOverview.html) is the service which provides serverless RDS databases. | ||
|
||
## Provisioned vs. Serverless | ||
|
||
|
@@ -41,6 +41,12 @@ will be able to handle the spikes in your database load. | |
|
||
## General Configuration Guidelines | ||
|
||
### Aurora Serverless V2 Capacity Range | ||
|
||
Serverless V2 allows users to configure the minumum and maximum number of ACUs the custer should use. | ||
|
||
[The Aurora Serverless V2 Docs](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless-v2-administration.html#aurora-serverless-v2-setting-acus) provide guidance for determining your capacity requirements and modifying the capacity range of your RDS cluster. | ||
|
||
### Cumulus Core Login Configuration | ||
|
||
Cumulus Core uses a `admin_db_login_secret_arn` and (optionally) `user_credentials_secret_arn` as inputs that allow various Cumulus components to act as a database administrator and/or read/write user. Those secrets should conform to the following format: | ||
|
@@ -77,39 +83,10 @@ Current security policy/best practices require use of a SSL enabled configuratio | |
|
||
Cumulus can accommodate a self-signed/unrecognized cert by setting `rejectUnauthorized` as `false` in the connection secret. This will result Core allowing use of certs without a valid CA. | ||
|
||
## Recommended Scaling Configuration for Aurora Serverless | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This isn't applicable in V2. The most relevant configuration options here are the min and max capacity settings. We could provide some guidance there but I don't believe we have that ready to go. Totally open to thoughts here as this was the most contentious part of these doc updates in my mind. |
||
|
||
If you are going to use an Aurora Serverless RDS database, we recommend the following scaling recommendations: | ||
|
||
- Set the autoscaling timeout to 1 minute (currently the lowest allowed value) | ||
- Set the database to force capacity change if the autoscaling timeout is reached | ||
|
||
The reason for these recommendations requires an understanding of Aurora Serverless scaling. | ||
Aurora Serverless scaling works as described in [the Amazon Aurora documentation](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless.how-it-works.html): | ||
|
||
> When it does need to perform a scaling operation, Aurora Serverless v1 first tries to identify a scaling point, a moment when no queries are being processed. | ||
|
||
However, during periods of heavy ingest, Cumulus will be continuously writing granules and other | ||
records to the database, so a "scaling point" will never be reached. This is where the | ||
"autoscaling timeout" setting becomes important. The "autoscaling timeout" is the amount of time | ||
that Aurora will wait to find a "scaling point" before giving up. | ||
|
||
So with the above recommended settings, we are telling Aurora to only wait for a "scaling point" | ||
for 1 minute and that if a "scaling point" cannot be found in that time, then we should | ||
**force the database to scale anyway**. These settings effectively make the Aurora Serverless database scale as quickly as possible in response to increased database load. | ||
|
||
With forced scaling on databases, there is a consequence that some running queries or transactions | ||
may be dropped. However, Cumulus write operations are written with automatic retry logic, so any | ||
write operations that failed due to database scaling should be retried successfully. | ||
|
||
### Cumulus Serverless RDS Cluster Module | ||
|
||
Cumulus provides a Terraform module that will deploy an Aurora Serverless RDS cluster. If you are | ||
using this module to create your RDS cluster, you can configure the autoscaling timeout action, | ||
the cluster minimum and maximum capacity, and more as seen in the [supported variables for the module](https://github.com/nasa/cumulus/blob/6f104a89457be453809825ac2b4ac46985239365/tf-modules/cumulus-rds-tf/variables.tf). | ||
|
||
Unfortunately, Terraform currently doesn't allow specifying the autoscaling timeout itself, so | ||
that value will have to be manually configured in the AWS console or CLI. | ||
using this module to create your RDS cluster, you can configure the cluster minimum and maximum capacity, and more as seen in the [supported variables for the module](https://github.com/nasa/cumulus/blob/6f104a89457be453809825ac2b4ac46985239365/tf-modules/cumulus-rds-tf/variables.tf). | ||
|
||
## Optional: Manage RDS Database with pgAdmin | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,7 +16,7 @@ configured [Aurora Serverless](https://aws.amazon.com/rds/aurora/serverless/) cl | |
|
||
To that end, Cumulus provides a terraform module | ||
[`cumulus-rds-tf`](https://github.com/nasa/cumulus/tree/master/tf-modules/cumulus-rds-tf) | ||
that will deploy an AWS RDS Aurora Serverless PostgreSQL 13 compatible [database cluster](https://aws.amazon.com/rds/aurora/postgresql-features/), and optionally provision a single deployment database with credentialed secrets for use with Cumulus. | ||
that will deploy an AWS RDS Aurora Serverless V2 PostgreSQL 13 compatible [database cluster](https://aws.amazon.com/rds/aurora/postgresql-features/), and optionally provision a single deployment database with credentialed secrets for use with Cumulus. | ||
|
||
We have provided an example terraform deployment using this module in the [Cumulus template-deploy repository](https://github.com/nasa/cumulus-template-deploy/tree/master/rds-cluster-tf) on GitHub. | ||
|
||
|
@@ -59,7 +59,7 @@ For Cumulus specific instructions on installation of Terraform, refer to the mai | |
|
||
#### Aurora/RDS | ||
|
||
This document also assumes some basic familiarity with PostgreSQL databases and Amazon Aurora/RDS. If you're unfamiliar consider perusing the [AWS docs](https://aws.amazon.com/rds/aurora/) and the [Aurora Serverless V1 docs](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless.html). | ||
This document also assumes some basic familiarity with PostgreSQL databases and Amazon Aurora/RDS. If you're unfamiliar consider perusing the [AWS docs](https://aws.amazon.com/rds/aurora/) and the [Aurora Serverless V2 docs](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless-v2.html). | ||
|
||
## Prepare Deployment Repository | ||
|
||
|
@@ -145,6 +145,7 @@ Fill in the appropriate values in `terraform.tfvars`. See the [rds-cluster-tf mo | |
- `max_capacity` -- the max ACUs the cluster is allowed to use. Carefully consider cost/performance concerns when setting this value. | ||
- `min_capacity` -- the minimum ACUs the cluster will scale to | ||
- `provision_user_database` -- Optional flag to allow module to provision a user database in addition to creating the cluster. Described in the [next section](#provision-user-and-user-database). | ||
- `cluster_instance_count` -- Number of RDS instances in the custer. Defaults to 1. | ||
|
||
#### Provision User and User Database | ||
|
||
|
@@ -227,6 +228,7 @@ Terraform will perform the following actions: | |
+ backup_retention_period = 1 | ||
+ cluster_identifier = "xxxxxxxxx" | ||
+ cluster_identifier_prefix = (known after apply) | ||
+ cluster_instance_count = 1 | ||
+ cluster_members = (known after apply) | ||
+ cluster_resource_id = (known after apply) | ||
+ copy_tags_to_snapshot = false | ||
|
@@ -237,7 +239,7 @@ Terraform will perform the following actions: | |
+ enable_http_endpoint = true | ||
+ endpoint = (known after apply) | ||
+ engine = "aurora-postgresql" | ||
+ engine_mode = "serverless" | ||
+ engine_mode = "provisioned" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is just example output but this is what it would look like now. |
||
+ engine_version = "10.12" | ||
+ final_snapshot_identifier = "xxxxxxxxx" | ||
+ hosted_zone_id = (known after apply) | ||
|
@@ -250,7 +252,7 @@ Terraform will perform the following actions: | |
+ preferred_maintenance_window = (known after apply) | ||
+ reader_endpoint = (known after apply) | ||
+ skip_final_snapshot = false | ||
+ storage_encrypted = (known after apply) | ||
+ storage_encrypted = true | ||
+ tags = { | ||
+ "Deployment" = "xxxxxxxxx" | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
--- | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These instructions were hammered out in https://bugs.earthdata.nasa.gov/browse/CUMULUS-3670 |
||
id: serverless-v2-upgrade | ||
title: Upgrading from Aurora Serverless V1 to V2 | ||
hide_title: false | ||
--- | ||
|
||
## There are 2 approaches to this migration | ||
|
||
### Option 1: Snapshot and Restore (simple process, more downtime) | ||
|
||
1. Take a manual snapshot of the v1 instance, including the current YYYY-MM-DD in the name of the snapshot. Wait for successful completion showing Status = Available. If this is an initial snapshot of the instance, it may take a substantial amount of time to complete. | ||
2. Ensure delete protection is turned off on the v1 instance, as it will be deleted and replaced with a v2 cluster and instance. Deletion protection can be toggled in the AWS Console under RDS > Databases > select database > Modify. | ||
3. Run "terraform show" to view the current state for module.rds_cluster.aws_rds_cluster.cumulus. | ||
Ensure final_snapshot_identifier is set in resource "aws_rds_cluster" "cumulus". Copy the value. If a snapshot exists with that name, delete that snapshot. | ||
Ensure skip_final_snapshot is false in resource "aws_rds_cluster" "cumulus". | ||
4. Update /example/rds-cluster-tf/terraform.tfvars (or custom .tfvars filename) to: | ||
remove: enable_upgrade | ||
add: snapshot_identifier = "final_snapshot_identifier" (Paste value from prior step) | ||
5. Stop ingest. | ||
6. Run "terraform apply" to create a new v2 cluster and instance(s) based on the v1 final snapshot, using the updated tfvars file (or custom .tfvars filename). Wait for completion. | ||
terraform apply -var-file=terraform.tfvars (or custom .tfvars filename) | ||
7. Resume ingest. | ||
8. The end result is the new v2 cluster is created containing the existing v1 data. | ||
|
||
### Option 2: Blue/Green Cutover (complex process, less downtime) | ||
|
||
AWS instructions for setting up a blue/green deployment: https://aws.amazon.com/blogs/database/upgrade-from-amazon-aurora-serverless-v1-to-v2-with-minimal-downtime/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See below comment. We could at some point make recommendations here but I don't think we have them now.