Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RDS Restore Option #2728

Merged
merged 15 commits into from
Apr 25, 2018
Merged

Add RDS Restore Option #2728

merged 15 commits into from
Apr 25, 2018

Conversation

jasonmcintosh
Copy link
Contributor

Adds support for creating a DB Instance from an S3 xtrabackup snapshot
#2410
First attempt to do go, so certain points are not entirely ideal. Also would love to rewrite the restore and create so many of the required options are set in a function vs. duplicated, but that can be refactored later

@jen20 jen20 added the enhancement Requests to existing resources that expand the functionality or scope. label Dec 21, 2017
@bflad bflad added the service/rds Issues and PRs that pertain to the rds service. label Jan 28, 2018
@jasonmcintosh
Copy link
Contributor Author

Been a month... any updates?

@ghost ghost added the size/L Managed by automation to categorize the size of a PR. label Feb 15, 2018
@bflad
Copy link
Contributor

bflad commented Feb 15, 2018

Hi @jasonmcintosh sorry for the maintainer delay in getting this reviewed for you. I don't have time to give this a fully exhaustive review right now but skimming the acceptance test, is it possible to actually test and have this succeed currently? It looks like it would be trying to pull the S3 backup from a random bucket and not succeed. Does AWS have a sample S3 backup that might work with this? Otherwise, we might need to get real creative.

@bflad bflad added the waiting-response Maintainers are waiting on response from community or contributor. label Feb 15, 2018
@jasonmcintosh
Copy link
Contributor Author

XtraBackup database - we can create and share a sample file. When I was testing in the provider, we only did unit tests vs. integration tests. Then, we built the provider and did manual integration tests. If you want an automatic integration test I can look at adding that support. It'll be trickier but doable - just have to create a random bucket, a bucket item from the sample file, and then run the integration tests and tear down. We'd need to store the xtrabackup file in the tests but thankfully it's a tiny binary file.

@ghost ghost added the size/L Managed by automation to categorize the size of a PR. label Feb 15, 2018
@jasonmcintosh
Copy link
Contributor Author

Bringing in master changes (which look like really one file) to validate/test again.

@jasonmcintosh
Copy link
Contributor Author

@bflad Note, this COULD be a misunderstanding on our part - we're pretty new to Go (some of us are). Were these acceptance tests and we just missed something on this?

@bflad
Copy link
Contributor

bflad commented Feb 15, 2018

Sorry our terminology for "acceptance" testing is equivalent to "integration" testing. It expects to fully test a Terraform configuration against the AWS APIs and have it succeed end-to-end. In order to accomplish that, it looks like we'll need to have the test perform quite a few steps according to the documentation including setting up an IAM role that RDS can assume and having a S3 location ready to go with a valid backup file in an automated fashion.

As for handling a valid backup file in S3, I can think of a few automated options:

  • Create all the infrastructure each test run to create a valid backup file in S3 such as creating an EC2 instance running MySQL that runs Percona XtraBackup and uploads to S3 each time (very brittle and time consuming)
  • Find and use a publicly accessible backup file for testing (potentially brittle)
  • Create a backup file for testing, provide to a maintainer here, we upload it to a known and managed location in S3 that is publicly accessible, and the acceptance test is updated to point there
  • Create a backup file for testing, add this file to the repository, then update the acceptance test to create a bucket and upload a backup file each time

But similar to what you said previously, we're probably going to want to go with one of the last two options, but I think it really depends on the size of the backup file. We probably don't want to include something in the repository over a few MB.

So short story here is that at the very least, the acceptance test for this feature will need to handle the IAM role creation, then we need to figure out the S3 backup file handling.

If you don't have time or interest in doing this, that's perfectly okay too 👍 . It just means this PR will be delayed as this functionality is not currently a priority (that I know about) for any of the Hashicorp employed maintainers.

@jasonmcintosh
Copy link
Contributor Author

jasonmcintosh commented Feb 15, 2018

We're happy to add this support :) Looking at options 3 & 4 right now - we're thinking option 4 should be viable (creating a test file now, getting it as small as we can). On role... looks like there's another test that creates a monitoring role. Can see if I can't re-use some of that. Good feedback/information - thanks! It may be early next week before we can get to this though. We can run the provider locally for now to get us moving while we add support.

@ghost ghost added the size/L Managed by automation to categorize the size of a PR. label Mar 21, 2018
@jasonmcintosh
Copy link
Contributor Author

@bflad We have a 1.3MB tiny backup with a test DB. Do you want it in git (it's a .tar.gz file) or an email so you can create a public S3 location? Working on the tests shortly :) Note, we'll be looking to add the same functionality to aurora - do you want those changes in the same pull request or separate?

@bflad
Copy link
Contributor

bflad commented Mar 21, 2018

@jasonmcintosh for now can you email it to my Github username at hashicorp dot com? I'll discuss how we want to handle this internally.

Separate pull request would be preferred. Thanks!

@ghost ghost added the size/L Managed by automation to categorize the size of a PR. label Apr 24, 2018
@jasonmcintosh
Copy link
Contributor Author

@bflad We've got acceptance tests quasi-working. It creates the database from an s3 restore. However, we've got a problem where IAM role creation is required for this to work, and the IAM role doesn't seem to be fully propagated before it's used. Do you have any thoughts on solutions around this? e.g. how to add a sleep into the code to make sure the IAM role is there first? We tried a depends_on block similar to the way the enhanced monitoring RDS role works, but that doesn't seem to work in the case of a restore. As we understand it, Step blocks in acceptance tests are destroyed before the next steps, so we can't do a Step "Pre-setup resources", Step "Sleep", Step "Create S3"?

@bflad
Copy link
Contributor

bflad commented Apr 24, 2018

Inside the resource.Retry() that wraps conn.RestoreDBInstanceFromS3(), add a conditional that allows it to retry based on the error coming back. You didn't provide exactly what you're seeing but something like this should work:

		err = resource.Retry(5*time.Minute, func() *resource.RetryError {
			_, err = conn.RestoreDBInstanceFromS3(&opts)
			if err != nil {
				if isAWSErr(err, "InvalidParameterValue", "... IAM role message here ...") {
					return resource.RetryableError(err)
				}
				if isAWSErr(err, "InvalidParameterValue", "ENHANCED_MONITORING") {
					return resource.RetryableError(err)
				}
				return resource.NonRetryableError(err)
			}
			return nil
		})

@ghost ghost added the size/L Managed by automation to categorize the size of a PR. label Apr 24, 2018
@jasonmcintosh
Copy link
Contributor Author

Went ahead and put this in a files/ folder - happy to change the config if you don't like this... didn't seem like there were many good options. Hadn't heard if you'd put this in a public S3 bucket someplace for test purposes. We're starting on the aurora piece next

@jasonmcintosh
Copy link
Contributor Author

Turns out terraform creates S3 buckets and IAM roles so fast, had to retry on both bucket and IAM role. But acceptance tests passed locally. We added VPC creation code as we delete default vpcs to make sure that everything needed for the test to pass would work regardless of environment.

Copy link
Contributor

@bflad bflad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working through this @jasonmcintosh -- I left an initial review down below. Can you check the items and let us know if you have any questions or do not have time to implement the feedback?

@@ -220,6 +220,49 @@ func resourceAwsDbInstance() *schema.Resource {
},
},

"s3_import": {
Type: schema.TypeSet,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For simple nested attributes with MaxItems: 1 we prefer schema.TypeList 👍

"bucket_name": {
Type: schema.TypeString,
Required: true,
Optional: false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Required: true is defined, so Optional: false is extraneous

},
"bucket_prefix": {
Type: schema.TypeString,
Required: false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: true is defined, so Required: false is extraneous

Type: schema.TypeString,
Required: false,
Optional: true,
Default: "",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The zero value for schema.TypeString is "" so Default: "" is extraneous

@@ -220,6 +220,49 @@ func resourceAwsDbInstance() *schema.Resource {
},
},

"s3_import": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add ForceNew: true to all child attributes of s3_import as there is no "update" available for it. 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this, though I could see long term an interesting debate. if it's use as a spring-board to create a database, should be able to remove the backup file after the database is created. SO in theory, deleting the block shouldn't cause a rebuild/recreate. But I guess that can be done with ignore attributes block instead...

err = resource.Retry(5*time.Minute, func() *resource.RetryError {
_, err = conn.RestoreDBInstanceFromS3(&opts)
if err != nil {
if awsErr, ok := err.(awserr.Error); ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified with:

					if isAWSErr(err, "InvalidParameterValue", "ENHANCED_MONITORING") {
						return resource.RetryableError(awsErr)
					}
					if isAWSErr(err, "InvalidParameterValue", "S3_SNAPSHOT_INGESTION") {
						return resource.RetryableError(err)
					}
					if isAWSErr(err, "InvalidParameterValue", "S3 bucket cannot be found") {
						return resource.RetryableError(err)
					}

resource "aws_s3_bucket_object" "xtrabackup_db" {
bucket = "${aws_s3_bucket.xtrabackup.id}"
key = "%s/sample.tar.gz"
source = "../files/2018-02-26_19-31-02.tar.gz"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please rename the backup file to something meaningful? e.g. mysql-5-6-xtrabackup.tar.gz


~> **NOTE:** Removing the `replicate_source_db` attribute from an existing RDS
Replicate database managed by Terraform will promote the database to a fully
standalone database.

### S3 Import Options
Full details on the core parameters and impacts are in the API Docs: [RestoreDBInstanceFromS3](http://docs.aws.amazon.com/AmazonRDS/latest/APIReference/API_RestoreDBInstanceFromS3.html). Sample
```hcl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this requires an extra newline above or the formatting will not work -- it can be verified using make website (available very recently in master)

}

resource "aws_iam_policy" "test" {
name = "tf-s3-rds-access-policy"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be randomized with -%s in case of test failures/concurrency. 👍

}

resource "aws_db_subnet_group" "foo" {
name = "foo"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be randomized with -%s in case of test failures/concurrency. 👍

@bflad bflad removed the waiting-response Maintainers are waiting on response from community or contributor. label Apr 24, 2018
Copy link
Contributor

@bflad bflad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on this @jasonmcintosh! 🚀

21 tests passed (all tests)
=== RUN   TestAccAWSDBInstance_ec2Classic
--- PASS: TestAccAWSDBInstance_ec2Classic (389.91s)
=== RUN   TestAccAWSDBInstance_importBasic
--- PASS: TestAccAWSDBInstance_importBasic (434.16s)
=== RUN   TestAccAWSDBInstance_kmsKey
--- PASS: TestAccAWSDBInstance_kmsKey (463.16s)
=== RUN   TestAccAWSDBInstance_basic
--- PASS: TestAccAWSDBInstance_basic (483.81s)
=== RUN   TestAccAWSDBInstance_MinorVersion
--- PASS: TestAccAWSDBInstance_MinorVersion (493.82s)
=== RUN   TestAccAWSDBInstance_diffSuppressInitialState
--- PASS: TestAccAWSDBInstance_diffSuppressInitialState (504.25s)
=== RUN   TestAccAWSDBInstance_iamAuth
--- PASS: TestAccAWSDBInstance_iamAuth (544.28s)
=== RUN   TestAccAWSDBInstance_portUpdate
--- PASS: TestAccAWSDBInstance_portUpdate (579.35s)
=== RUN   TestAccAWSDBInstance_optionGroup
--- PASS: TestAccAWSDBInstance_optionGroup (604.39s)
=== RUN   TestAccAWSDBInstance_separate_iops_update
--- PASS: TestAccAWSDBInstance_separate_iops_update (730.22s)
=== RUN   TestAccAWSDBInstance_s3
--- PASS: TestAccAWSDBInstance_s3 (789.36s)
=== RUN   TestAccAWSDBInstance_enhancedMonitoring
--- PASS: TestAccAWSDBInstance_enhancedMonitoring (860.33s)
=== RUN   TestAccAWSDBInstance_noSnapshot
--- PASS: TestAccAWSDBInstance_noSnapshot (866.56s)
=== RUN   TestAccAWSDBInstance_cloudwatchLogsExportConfigurationUpdate
--- PASS: TestAccAWSDBInstance_cloudwatchLogsExportConfigurationUpdate (696.78s)
=== RUN   TestAccAWSDBInstance_generatedName
--- PASS: TestAccAWSDBInstance_generatedName (1349.43s)
=== RUN   TestAccAWSDBInstance_cloudwatchLogsExportConfiguration
--- PASS: TestAccAWSDBInstance_cloudwatchLogsExportConfiguration (1409.68s)
=== RUN   TestAccAWSDBInstance_namePrefix
--- PASS: TestAccAWSDBInstance_namePrefix (1435.82s)
=== RUN   TestAccAWSDBInstance_replica
--- PASS: TestAccAWSDBInstance_replica (1686.79s)
=== RUN   TestAccAWSDBInstance_subnetGroup
--- PASS: TestAccAWSDBInstance_subnetGroup (1769.29s)
=== RUN   TestAccAWSDBInstance_snapshot
--- PASS: TestAccAWSDBInstance_snapshot (1993.40s)
=== RUN   TestAccAWSDBInstance_MSSQL_TZ
--- PASS: TestAccAWSDBInstance_MSSQL_TZ (2444.66s)

@bflad bflad merged commit 9ce9734 into hashicorp:master Apr 25, 2018
@bflad bflad added this to the v1.16.0 milestone Apr 25, 2018
bflad added a commit that referenced this pull request Apr 25, 2018
@jasonmcintosh
Copy link
Contributor Author

Thanks a ton @bflad will have one for Aurora soon. I'm HOPING in some spare time I can work on refactoring some of that codebase. There's a lot of duplication there that I think could be cleaned up.

@bflad
Copy link
Contributor

bflad commented Apr 25, 2018

This has been released in version 1.16.0 of the AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

@ghost
Copy link

ghost commented Apr 6, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

@ghost ghost locked and limited conversation to collaborators Apr 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement Requests to existing resources that expand the functionality or scope. service/rds Issues and PRs that pertain to the rds service. size/L Managed by automation to categorize the size of a PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants