Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDS Global Cluster: Error with In-place Engine Version Upgrade #21879

Closed
YakDriver opened this issue Nov 23, 2021 · 5 comments · Fixed by #23560
Closed

RDS Global Cluster: Error with In-place Engine Version Upgrade #21879

YakDriver opened this issue Nov 23, 2021 · 5 comments · Fixed by #23560
Labels
service/rds Issues and PRs that pertain to the rds service.
Milestone

Comments

@YakDriver
Copy link
Member

YakDriver commented Nov 23, 2021

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform CLI and Terraform AWS Provider Version

Terraform v1.0.11
AWS v3.66.0

Affected Resource(s)

  • aws_rds_global_cluster
  • aws_rds_cluster
  • aws_rds_cluster_instance

Terraform Configuration Files

Please include all Terraform configurations required to reproduce the bug. Bug reports without a functional reproduction may be closed without investigation.

resource "aws_rds_global_cluster" "example" {
  global_cluster_identifier = "global-test"
  engine                    = "aurora"
  engine_version            = "5.6.mysql_aurora.1.22.2"
  database_name             = "example_db"
}

resource "aws_rds_cluster" "primary" {
  provider                  = aws.primary
  engine                    = aws_rds_global_cluster.example.engine
  engine_version            = aws_rds_global_cluster.example.engine_version
  cluster_identifier        = "test-primary-cluster"
  master_username           = "username"
  master_password           = "somepass123"
  database_name             = "example_db"
  global_cluster_identifier = aws_rds_global_cluster.example.id
  db_subnet_group_name      = "default"
}

resource "aws_rds_cluster_instance" "primary" {
  provider             = aws.primary
  engine               = aws_rds_global_cluster.example.engine
  engine_version       = aws_rds_global_cluster.example.engine_version
  identifier           = "test-primary-cluster-instance"
  cluster_identifier   = aws_rds_cluster.primary.id
  instance_class       = "db.r4.large"
  db_subnet_group_name = "default"
}

resource "aws_rds_cluster" "secondary" {
  provider                  = aws.secondary
  engine                    = aws_rds_global_cluster.example.engine
  engine_version            = aws_rds_global_cluster.example.engine_version
  cluster_identifier        = "test-secondary-cluster"
  global_cluster_identifier = aws_rds_global_cluster.example.id
  db_subnet_group_name      = "default"
}

resource "aws_rds_cluster_instance" "secondary" {
  provider             = aws.secondary
  engine               = aws_rds_global_cluster.example.engine
  engine_version       = aws_rds_global_cluster.example.engine_version
  identifier           = "test-secondary-cluster-instance"
  cluster_identifier   = aws_rds_cluster.secondary.id
  instance_class       = "db.r4.large"
  db_subnet_group_name = "default"

  depends_on = [
    aws_rds_cluster_instance.primary
  ]
}

Expected Behavior

All cluster versions are upgraded in place.

Actual Behavior

Produces error:

Error: Failed to update engine_version on global cluster member (arn:aws:rds:us-west-2:12345678901:cluster:test-secondary-cluster-instance): InvalidParameterValue: The provided ARN (arn:aws:rds:us-west-2: 12345678901:cluster:test-secondary-cluster-instance) is invalid for this parameter (DBClusterIdentifier). Expected region = us-east-1, actual region = us-west-2
status code: 400, request id: 9135cc6c-cf06-42f0-bac6-266ada9c5d20

Steps to Reproduce

  1. terraform apply
  2. Change the version for minor upgrade (i.e., 5.7)
  3. terraform apply

References

@github-actions github-actions bot added the service/rds Issues and PRs that pertain to the rds service. label Nov 23, 2021
@dsagar799
Copy link

I am also getting same error, trying to upgrade minor version from 11.7 to 11.9 for aurora postgresql. Please advise as to what is the solution?

@YakDriver YakDriver removed their assignment Jan 18, 2022
@gpriya-adh
Copy link

I was also getting the same error when I tried to upgrade from 11.7 to 11.19.
when I ran terraform-apply it worked, but when viewed in the console it showed that same version as before.
Solution described here worked https://medium.com/hashicorp-engineering/upgrading-aurora-rds-using-terraform-3836a62757f..
still its a lot of effort if you have multiple dbs

@YakDriver
Copy link
Member Author

YakDriver commented Mar 14, 2022

There were several issues causing this. I'll describe some of them here for future travelers and my future self.

  • Not sure about the upgrade version design. I'm carrying on with the same design but fixing it. Pros/cons of sticking with the same design:
    • [con] aws_rds_global_cluster makes changes to aws_rds_cluster that require the use of lifecycle and ignore_changes on aws_rds_cluster.
    • [pro] One consistent way to upgrade versions, whether major or minor. Major must be done through the aws_rds_global_cluster and minor must be done through aws_rds_cluster. Easier practitioner experience.
    • [pro] Avoid breaking changes, pushing minor upgrade to aws_rds_cluster.
  • One problem was aws_rds_clusters that were in different regions but associated with an aws_rds_global_cluster. Through the main connection, you would get "not found" errors. Need to use connection for those regions.
  • In the loop for clusters in the global cluster, there would be empty clusters for some reason so it would timeout looking for these clusters even though nothing existed. The problem was related to regions again by using API calls to find cluster ID using ARN when ARN wasn't found. Instead of erroring, it was returning an empty cluster ID.
  • AWS says that the DB cluster ID in the ModifyDBCluster call can be either ARN or ID. Maybe that used to work but now only ID is accepted. Getting ID from ARN itself now instead of going back to AWS API.
  • After upgrading all the clusters in the global cluster, the global cluster version would sometimes update to the new version and sometimes not. Even waiting 30 min for global cluster to update, it wouldn't always update. However, if the version didn't update, we could try the upgrades on the cluster again and it wouldn't take long (because they were already upgraded) but would update the global cluster.

Phew.

@github-actions github-actions bot added this to the v4.6.0 milestone Mar 14, 2022
@github-actions
Copy link

This functionality has been released in v4.6.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

@github-actions
Copy link

github-actions bot commented May 7, 2022

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 7, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
service/rds Issues and PRs that pertain to the rds service.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants