Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: DMS aws_dms_replication_config resource cannot be updated from a running or stopped state and must almost always be recreated #35650

Open
cdavis2951 opened this issue Feb 6, 2024 · 5 comments
Labels
bug Addresses a defect in current functionality. service/dms Issues and PRs that pertain to the dms service.

Comments

@cdavis2951
Copy link

Terraform Core Version

1.5.5

AWS Provider Version

5.31.0, 5.32.1

Affected Resource(s)

  • aws_dms_replication_config

Expected Behavior

  • When the Serverless replication is running the replication should be stopped, modified, and then restarted (depending on start replication flag)
  • When the Serverless replication is stopped it should be modified and then restarted (depending on start replication flag)

Actual Behavior

Resource will rarely succeed without error, but every other time it seems the replication is attempting to be started before modification is complete. The behavior seems to persist even when the start_replication flag is set to false regardless of the current state of the replication_config. Only a failed replication_config appears to update without error.

Relevant Error/Panic Output Snippet

Error: starting DMS Serverless Replication (arn:aws:dms:us-west-2:xxxxxxxx:replication-config:xxxxxxxx): InvalidResourceStateFault: Replication for Replication Config: xxxxxxxx is in MODIFYING state and cannot start

Terraform Configuration Files

resource "aws_dms_replication_subnet_group" "this" {
  replication_subnet_group_description = "Replication subnet group for ${local.full_name_kebab}"
  replication_subnet_group_id          = local.full_name_kebab
  subnet_ids                           = var.vpc_subnet_ids
  tags                                 = var.tags
}

resource "random_integer" "id_seed" {
  min = 1
  max = 50000
}

resource "aws_dms_replication_config" "this" {
  replication_config_identifier = local.full_name_kebab
  replication_type              = "full-load-and-cdc"
  source_endpoint_arn           = aws_dms_endpoint.source_db_endpoint.endpoint_arn
  target_endpoint_arn           = aws_dms_s3_endpoint.datalake_target_endpoint.endpoint_arn
  start_replication             = true
  tags                          = var.tags

  replication_settings = templatefile("${path.module}/replication-settings.json.tftpl", {
    target_table_prep_mode = var.target_table_prep_mode
    support_lobs           = var.lob_configuration.lob_support
    lob_chunk_size         = var.lob_configuration.lob_chunk_size
    lob_max_size           = var.lob_configuration.lob_max_size_kb
    full_lob_mode          = var.lob_configuration.is_full_lob_mode
    limited_size_lob_mode  = !var.lob_configuration.is_full_lob_mode
  })

  table_mappings = jsonencode({
    rules = flatten(
      [
        for table_key, table in var.table_mappings :
        [
          {
            rule-type      = "selection"
            rule-id        = (random_integer.id_seed.result + index(keys(var.table_mappings), table_key))
            rule-name      = "${table_key}-selection"
            rule-action    = "include"
            object-locator = {
              schema-name = var.source_schema_name != null ? var.source_schema_name : var.source_database_name
              table-name  = table_key
            }
          },
          {
            rule-type      = "transformation"
            rule-id        = (random_integer.id_seed.result + index(keys(var.table_mappings), table_key) + 10000)
            rule-name      = "${table_key}-cdc-increment"
            rule-target    = "column"
            object-locator = {
              schema-name = var.source_schema_name != null ? var.source_schema_name : var.source_database_name
              table-name  = table_key
            },
            rule-action = "add-column"
            value       = "cdc_increment"
            expression  = "substr($AR_H_CHANGE_SEQ,17,19)"
            data-type   = {
              type   = "string"
              length = 50
            }
          },
          [
            for transformation_key, transformation in table.transformations != null ? table.transformations : {} :
            [
              {
                rule-type      = transformation.rule_type
                rule-id        = (random_integer.id_seed.result + index(keys(var.table_mappings), table_key) + 100000)
                rule-name      = "${table_key}-${transformation_key}"
                rule-target    = transformation.rule_target
                object-locator = {
                  schema-name = var.source_schema_name != null ? var.source_schema_name : var.source_database_name
                  table-name  = table_key
                  column-name = transformation.object_locator.column_name
                  data-type   = transformation.object_locator.data_type
                }
                rule-action = transformation.rule_action
                value       = transformation.transformation_value
                expression  = transformation.expression
                data-type   = transformation.data_type
              }
            ]
          ]
        ]
      ]
    )
  })


  compute_config {
    replication_subnet_group_id  = aws_dms_replication_subnet_group.this.id
    max_capacity_units           = var.dms_config_max_capacity_units
    min_capacity_units           = var.dms_config_min_capacity_units
    preferred_maintenance_window = var.maintenance_window
    vpc_security_group_ids       = var.vpc_security_group_ids
    multi_az                     = var.multi_az
  }

  lifecycle {
    replace_triggered_by = [
      aws_dms_endpoint.source_db_endpoint,
      aws_dms_s3_endpoint.datalake_target_endpoint,
    ]
  }
}
{
  "Logging": {
    "EnableLogging": true,
    "EnableLogContext": false,
    "LogComponents": [
      {
        "Severity": "LOGGER_SEVERITY_DEFAULT",
        "Id": "TRANSFORMATION"
      },
      {
        "Severity": "LOGGER_SEVERITY_DEFAULT",
        "Id": "SOURCE_UNLOAD"
      },
      {
        "Severity": "LOGGER_SEVERITY_DEFAULT",
        "Id": "IO"
      },
      {
        "Severity": "LOGGER_SEVERITY_DEFAULT",
        "Id": "TARGET_LOAD"
      },
      {
        "Severity": "LOGGER_SEVERITY_DEFAULT",
        "Id": "PERFORMANCE"
      },
      {
        "Severity": "LOGGER_SEVERITY_DEFAULT",
        "Id": "SOURCE_CAPTURE"
      },
      {
        "Severity": "LOGGER_SEVERITY_DEFAULT",
        "Id": "SORTER"
      },
      {
        "Severity": "LOGGER_SEVERITY_DEFAULT",
        "Id": "REST_SERVER"
      },
      {
        "Severity": "LOGGER_SEVERITY_DEFAULT",
        "Id": "VALIDATOR_EXT"
      },
      {
        "Severity": "LOGGER_SEVERITY_DEFAULT",
        "Id": "TARGET_APPLY"
      },
      {
        "Severity": "LOGGER_SEVERITY_DEFAULT",
        "Id": "TASK_MANAGER"
      },
      {
        "Severity": "LOGGER_SEVERITY_DEFAULT",
        "Id": "TABLES_MANAGER"
      },
      {
        "Severity": "LOGGER_SEVERITY_DEFAULT",
        "Id": "METADATA_MANAGER"
      },
      {
        "Severity": "LOGGER_SEVERITY_DEFAULT",
        "Id": "FILE_FACTORY"
      },
      {
        "Severity": "LOGGER_SEVERITY_DEFAULT",
        "Id": "COMMON"
      },
      {
        "Severity": "LOGGER_SEVERITY_DEFAULT",
        "Id": "ADDONS"
      },
      {
        "Severity": "LOGGER_SEVERITY_DEFAULT",
        "Id": "DATA_STRUCTURE"
      },
      {
        "Severity": "LOGGER_SEVERITY_DEFAULT",
        "Id": "COMMUNICATION"
      },
      {
        "Severity": "LOGGER_SEVERITY_DEFAULT",
        "Id": "FILE_TRANSFER"
      }
    ]
  },
  "StreamBufferSettings": {
    "StreamBufferCount": 3,
    "CtrlStreamBufferSizeInMB": 5,
    "StreamBufferSizeInMB": 8
  },
  "ErrorBehavior": {
    "FailOnNoTablesCaptured": true,
    "ApplyErrorUpdatePolicy": "LOG_ERROR",
    "FailOnTransactionConsistencyBreached": false,
    "RecoverableErrorThrottlingMax": 1800,
    "DataErrorEscalationPolicy": "SUSPEND_TABLE",
    "ApplyErrorEscalationCount": 0,
    "RecoverableErrorStopRetryAfterThrottlingMax": true,
    "RecoverableErrorThrottling": true,
    "ApplyErrorFailOnTruncationDdl": false,
    "DataTruncationErrorPolicy": "LOG_ERROR",
    "ApplyErrorInsertPolicy": "LOG_ERROR",
    "EventErrorPolicy": "IGNORE",
    "ApplyErrorEscalationPolicy": "LOG_ERROR",
    "RecoverableErrorCount": -1,
    "DataErrorEscalationCount": 0,
    "TableErrorEscalationPolicy": "STOP_TASK",
    "RecoverableErrorInterval": 5,
    "ApplyErrorDeletePolicy": "IGNORE_RECORD",
    "TableErrorEscalationCount": 0,
    "FullLoadIgnoreConflicts": true,
    "DataErrorPolicy": "LOG_ERROR",
    "TableErrorPolicy": "SUSPEND_TABLE"
  },
  "TTSettings": {
    "TTS3Settings": null,
    "TTRecordSettings": null,
    "FailTaskOnTTFailure": false,
    "EnableTT": false
  },
  "FullLoadSettings": {
    "CommitRate": 10000,
    "StopTaskCachedChangesApplied": false,
    "StopTaskCachedChangesNotApplied": false,
    "MaxFullLoadSubTasks": 8,
    "TransactionConsistencyTimeout": 600,
    "CreatePkAfterFullLoad": false,
    "TargetTablePrepMode": "${target_table_prep_mode}"
  },
  "TargetMetadata": {
    "ParallelApplyBufferSize": 0,
    "ParallelApplyQueuesPerThread": 0,
    "ParallelApplyThreads": 0,
    "TargetSchema": "",
    "InlineLobMaxSize": 0,
    "ParallelLoadQueuesPerThread": 0,
    "SupportLobs": ${support_lobs},
    "LobChunkSize": ${lob_chunk_size},
    "TaskRecoveryTableEnabled": false,
    "ParallelLoadThreads": 0,
    "LobMaxSize": ${lob_max_size},
    "BatchApplyEnabled": false,
    "FullLobMode": ${full_lob_mode},
    "LimitedSizeLobMode": ${limited_size_lob_mode},
    "LoadMaxFileSize": 0,
    "ParallelLoadBufferSize": 0
  },
  "BeforeImageSettings": null,
  "ControlTablesSettings": {
    "historyTimeslotInMinutes": 5,
    "CommitPositionTableEnabled": false,
    "HistoryTimeslotInMinutes": 5,
    "StatusTableEnabled": false,
    "SuspendedTablesTableEnabled": false,
    "HistoryTableEnabled": false,
    "ControlSchema": "",
    "FullLoadExceptionTableEnabled": false
  },
  "LoopbackPreventionSettings": null,
  "CharacterSetSettings": null,
  "FailTaskWhenCleanTaskResourceFailed": false,
  "ChangeProcessingTuning": {
    "StatementCacheSize": 50,
    "CommitTimeout": 1,
    "BatchApplyPreserveTransaction": true,
    "BatchApplyTimeoutMin": 1,
    "BatchSplitSize": 0,
    "BatchApplyTimeoutMax": 30,
    "MinTransactionSize": 1000,
    "MemoryKeepTime": 60,
    "BatchApplyMemoryLimit": 500,
    "MemoryLimitTotal": 1024
  },
  "ChangeProcessingDdlHandlingPolicy": {
    "HandleSourceTableDropped": true,
    "HandleSourceTableTruncated": true,
    "HandleSourceTableAltered": true
  },
  "PostProcessingRules": null
}

Steps to Reproduce

I'm betting the issue would arise with any source/target endpoint types, but we've seen it with both MySQL & PSQL source endpoints with an S3 target endpoint using parquet 2.0.

Deploy the replication config, modify anything that causes it to update without being recreated, and attempt to apply. Applying changes should fail.

Debug Output

No response

Panic Output

No response

Important Factoids

No response

References

No response

Would you like to implement a fix?

None

@cdavis2951 cdavis2951 added the bug Addresses a defect in current functionality. label Feb 6, 2024
Copy link

github-actions bot commented Feb 6, 2024

Community Note

Voting for Prioritization

  • Please vote on this issue by adding a 👍 reaction to the original post to help the community and maintainers prioritize this request.
  • Please see our prioritization guide for information on how we prioritize.
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.

Volunteering to Work on This Issue

  • If you are interested in working on this issue, please leave a comment.
  • If this would be your first contribution, please review the contribution guide.

@github-actions github-actions bot added the service/dms Issues and PRs that pertain to the dms service. label Feb 6, 2024
@terraform-aws-provider terraform-aws-provider bot added the needs-triage Waiting for first response or review from a maintainer. label Feb 6, 2024
@cdavis2951
Copy link
Author

Something else I've noticed that may or may not be the same bug is that if I deploy the replication config with start_replication off from the start it shows a diff because of the cloudwatch log group configuration, but at apply time it errors saying no modifications requested.
Screenshot 2024-02-06 at 2 01 09 AM
Screenshot 2024-02-06 at 2 01 39 AM

@justinretzolk justinretzolk removed the needs-triage Waiting for first response or review from a maintainer. label Feb 15, 2024
@justinretzolk
Copy link
Member

Hey @cdavis2951 👋 Thanks for taking the time to report this! The thing you mentioned in your most recent comment looks a lot like #35573. Just a heads up in case you want to follow along with that one as well 🙂

@Quixotical
Copy link

@cdavis2951 any chance you found a solution to this issue?

@cdavis2951
Copy link
Author

@cdavis2951 any chance you found a solution to this issue?

I think we found that if you set start_replication_task to false and make sure the task is running when you run terraform the action of stopping the task is enough for it to have "modified" something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Addresses a defect in current functionality. service/dms Issues and PRs that pertain to the dms service.
Projects
None yet
Development

No branches or pull requests

3 participants