Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Incorrectly recreating DynamoDB replicas #35629

Closed
szesch opened this issue Feb 3, 2024 · 4 comments · Fixed by #35630
Closed

[Bug]: Incorrectly recreating DynamoDB replicas #35629

szesch opened this issue Feb 3, 2024 · 4 comments · Fixed by #35630
Labels
bug Addresses a defect in current functionality. service/dynamodb Issues and PRs that pertain to the dynamodb service.
Milestone

Comments

@szesch
Copy link
Contributor

szesch commented Feb 3, 2024

Terraform Core Version

1.4.4

AWS Provider Version

5.34.0

Affected Resource(s)

  • dynamodb_table

Expected Behavior

Running a plan with no changes to dynamodb replicas should result in a nop.

Actual Behavior

Due to issues with the DynamoDB API inconsistently returning the kms_key_arn the provider thinks that the existing replica does not have the kms_key_arn which in fact it does. This results in the provider thinking it needs to recreate the replica using the configured kms_key_arn.

Relevant Error/Panic Output Snippet

No response

Terraform Configuration Files

terraform {
  required_providers {
    aws = {
      version = "5.34"
    }
  }
}

provider "aws" {
  region = "us-east-2"
}

provider "aws" {
  region = "us-west-2"
  alias  = "uswest2"
}

provider "aws" {
  region = "eu-west-2"
  alias  = "euwest2"
}

provider "aws" {
  region = "ap-southeast-2"
  alias  = "apsoutheast2"
}

provider "aws" {
  region = "eu-central-1"
  alias  = "eucentral1"
}

provider "aws" {
  region = "sa-east-1"
  alias  = "saeast1"
}

resource "aws_kms_key" "us_east_2" {
  description = "CMK for us-east-2"
}

resource "aws_kms_key" "us_west_2" {
  description = "CMK for us-west-2"
  provider    = aws.uswest2
}

resource "aws_kms_key" "eu_west_2" {
  description = "CMK for eu-west-2"
  provider    = aws.euwest2
}

resource "aws_kms_key" "ap_southeast_2" {
  description = "CMK for ap-southeast-2"
  provider    = aws.apsoutheast2
}

resource "aws_kms_key" "eu_central_1" {
  description = "CMK for eu-central-1"
  provider    = aws.eucentral1
}

resource "aws_kms_key" "sa_east_1" {
  description = "CMK for ap-southeast-4"
  provider    = aws.saeast1
}

module "global" {
  source  = "terraform-aws-modules/dynamodb-table/aws"
  version = "3.2.0"

  name                               = "global"
  hash_key                           = "PK"
  range_key                          = "SK"
  billing_mode                       = "PAY_PER_REQUEST"
  stream_enabled                     = true
  stream_view_type                   = "NEW_AND_OLD_IMAGES"
  point_in_time_recovery_enabled     = true
  deletion_protection_enabled        = false
  server_side_encryption_enabled     = true
  server_side_encryption_kms_key_arn = aws_kms_key.us_east_2.arn

  attributes = [
    {
      name = "PK"
      type = "S"
    },
    {
      name = "SK"
      type = "S"
    }
  ]

  replica_regions = [
    {
      region_name            = "us-west-2"
      point_in_time_recovery = true
      kms_key_arn            = aws_kms_key.us_west_2.arn
    },
    {
      region_name            = "eu-west-2"
      point_in_time_recovery = true
      kms_key_arn            = aws_kms_key.eu_west_2.arn
    },
    {
      region_name            = "ap-southeast-2"
      point_in_time_recovery = true
      kms_key_arn            = aws_kms_key.ap_southeast_2.arn
    },
    {
      region_name            = "eu-central-1"
      point_in_time_recovery = true
      kms_key_arn            = aws_kms_key.eu_central_1.arn
    },
    {
      region_name            = "sa-east-1"
      point_in_time_recovery = true
      kms_key_arn            = aws_kms_key.sa_east_1.arn
    },
  ]
}

Steps to Reproduce

A normal terraform plan on an existing table with 5+ replicas can normally reproduce this. It's important to note that the root cause is the DynamoDB API inconsistently not returning replica details. That means that the issue is not always reproducible. See the "Important Factoids" section for more details.

Debug Output

No response

Panic Output

No response

Important Factoids

It's important to note that the root cause of the issue is that the DynamoDB API only takes a best effort approach to returning replica information in a call to describe the table. A small table with a few replicas may not experience this issue at all. I did not see it until a table started using 5 replicas. Some regions also fail to return the replica details more often than others.

I wrote a script to capture the percentage of DescribeTable requests that fail to contain replica kms_key_arn settings for an AWS support ticket I was opening. Here's my comment from the support ticket:

I created a test script that called DescribeTable 100 times in each region we have a replica in.
The script waited 2 seconds between each call. The script recorded the number of times kms_key_arn 
was returned for the replicas and the number of times it was not.

Here are the results. This is the percentage of requests out of 100 that DID contain 
kms_key_arn in the replicas.

us-east-2: 21%
us-west-2: 100%
eu-west-1: 15%
eu-central-1: 24%
ap-southeast-2: 94%
ap-southeast-4: 46%

So from this data we can see that if the DescribeTable call is made using us-west-2 region then 100% of the time the kms_key_arn will be populated for replicas. In other regions like us-east-2 this can frequently result in no kms_key_arn being returned and as a result the provider attempts to recreate a replica using the kms_key_arn even though the replica is actually correctly configured.

The support ticket was escalated to the DynamoDB service team and I eventually received the following response

The service team have informed me of the following:

'When a customer calls DescribeTable on a DynamoDB global table, DynamoDB performs one cross region
access per global table replica to retrieve settings of each replica. However, this cross region access is best
effort and is not guaranteed to always return replica description of all global table replicas (in event of
cross-region network partitions). In this case, because the customer's global table replicas are geographically
spread out, a subset of cross region calls timed out and hence are not returned in the TableDescription output.

For applications that rely on information about each global table replica, we recommend to do a DescribeTable
call against each Global Table replica region endpoint.'

Essentially as the cross-region access is best effort, it is not guaranteed. For any use case that relies on the
information about each global table, 'DescribeTable' call should be called against the replicas regions endpoint.

The solution is to make a DescribeTable call per replica instead of relying on a single call's replicas details which may or may not be populated.

References

No response

Would you like to implement a fix?

Yes

@szesch szesch added the bug Addresses a defect in current functionality. label Feb 3, 2024
Copy link

github-actions bot commented Feb 3, 2024

Community Note

Voting for Prioritization

  • Please vote on this issue by adding a 👍 reaction to the original post to help the community and maintainers prioritize this request.
  • Please see our prioritization guide for information on how we prioritize.
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.

Volunteering to Work on This Issue

  • If you are interested in working on this issue, please leave a comment.
  • If this would be your first contribution, please review the contribution guide.

@github-actions github-actions bot added the service/kms Issues and PRs that pertain to the kms service. label Feb 3, 2024
@terraform-aws-provider terraform-aws-provider bot added the needs-triage Waiting for first response or review from a maintainer. label Feb 3, 2024
@szesch
Copy link
Contributor Author

szesch commented Feb 3, 2024

This is addressed in #35630

@ewbankkit ewbankkit added service/dynamodb Issues and PRs that pertain to the dynamodb service. and removed service/kms Issues and PRs that pertain to the kms service. needs-triage Waiting for first response or review from a maintainer. labels Feb 5, 2024
@github-actions github-actions bot added this to the v5.36.0 milestone Feb 6, 2024
Copy link

github-actions bot commented Feb 8, 2024

This functionality has been released in v5.36.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. service/dynamodb Issues and PRs that pertain to the dynamodb service.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants