RDS InvalidDBInstanceState: Instance cannot currently reboot due to an in-progress management operation #11905

kbaldyga · 2020-02-05T16:04:34Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

Terraform v0.12.20
provider.aws v2.46.0

Affected Resource(s)

aws_db_instance
aws_db_parameter_group

Terraform Configuration Files

terraform {
  required_providers {
    aws = "= 2.46.0"
  }
}
provider "aws" { region = "us-west-1" }

data "aws_vpc" "wailupes-main" {
  filter {
    name   = "tag:Name"
    values = ["wailupes-main"]
  }
}
data "aws_iam_role" "enhanced_monitoring" {
  name = "staging-enhanced-monitoring"
}

resource "aws_db_instance" "rds" {
  identifier                   = "test-rds"
  allocated_storage            = 100
  engine                       = "postgres"
  engine_version               = "11.1"
  instance_class               = "db.m4.large"
  name                         = "testdb"
  username                     = "testuser"
  password                     = "testpassword"
  db_subnet_group_name         = "wailupes-rds"
  parameter_group_name         = "postgres-11-tuned-staging"
  multi_az                     = false
  storage_type                 = "gp2"
  storage_encrypted            = false
  auto_minor_version_upgrade   = false
  apply_immediately            = true
  deletion_protection          = false
  kms_key_id                   = ""
  performance_insights_enabled = false
  backup_retention_period      = 1
  ca_cert_identifier           = "rds-ca-2019"
  monitoring_interval          = 30
  monitoring_role_arn          = data.aws_iam_role.enhanced_monitoring.arn
  skip_final_snapshot          = true

  timeouts {
    update = "120m"
  }
}

resource "aws_db_instance" "rds-read" {
  identifier                 = "test-rds-read-0"
  allocated_storage          = 100
  engine                     = "postgres"
  engine_version             = "11.1"
  instance_class             = "db.m4.large"
  username                   = "testuser"
  parameter_group_name       = "postgres-11-tuned-staging"
  storage_type               = "gp2"
  storage_encrypted          = false
  replicate_source_db        = aws_db_instance.rds.id
  auto_minor_version_upgrade = false
  apply_immediately          = true

  monitoring_interval          = 30
  monitoring_role_arn          = data.aws_iam_role.enhanced_monitoring.arn
  kms_key_id                   = ""
  performance_insights_enabled = false
  skip_final_snapshot          = true
  ca_cert_identifier           = "rds-ca-2019"
}

Debug Output

Shortened debug output here: https://gist.github.com/kbaldyga/825f0239776463a69969b847f35d53bd

Expected Behavior

When adding a read-replica to an existing RDS instance, with a custom db parameter group, enhanced monitoring and ca_cert_identifier, terraform will randomly fail with Instance cannot currently reboot due to an in-progress management operation. The read replica is eventually correctly created, but the resource is marked as tainted and terraform returns an error response code.

Actual Behavior

When adding a read-replica to an existing RDS instance, terraform aws provider performs multiple steps:

creates a read replica (I can see in the log file rds/CreateDBInstanceReadReplica), this than waits (rds/DescribeDBInstances) for the instance to be available,
next it calls ModifyDBInstance (see attached log file), this again calls rds/DescribeDBInstances multiple times and waits for the instance to be available,
once the instance is available, terraform calls rds/RebootDBInstance. But in the meantime AWS decides to apply changes to the instance and the call to rds/RebootDBInstance fails.

Because this all depends on time, it's difficult to consistently reproduce the issue. But after spending some time with various configurations, I am pretty confident it's the combination of all 3: enhanced monitoring, ca_cert_identifier, and custom parameter group in the resource "aws_db_instance" "rds-read" that's causing the issue.
As a workaround we decided to remove the ca_cert_identifier for now from our terraform configuration, since "rds-ca-2019" is the new default anyways.

The text was updated successfully, but these errors were encountered:

roscoecairney · 2020-03-02T13:36:26Z

I think the ca_cert_identifier is the cause of this. In govcloud, this value needs to be set to "rds-ca-2017". Each time the provider attempts to create a aws_rds_cluster_instance in govcloud, the apply fails with

Error: error rebooting DB Instance (xxxx-gov-dev-3): InvalidDBInstanceState: Instance cannot currently reboot due to an in-progress management operation.
	status code: 400, request id: 597cafa8-c9dd-4ee9-9678-9e3d6e11efd5

It is possible to get past this by untainting the resource and running the apply again

Reproduced on provider version 2.51.0

nijave · 2020-04-27T18:46:44Z

Seeing this as well in us-east-1 (the OP is from us-west-1). It looks like Terraform should probably just retry in the face of these errors

Fixes hashicorp#11905

rgonzales5-chwy · 2020-11-24T17:06:15Z

I have faced this error every time i try to create new replica instances; there has not even been a single time where this succeeds. I do notice everytime that the replica instance gets created successfully in the console but the terraform state still marks it as a tainted resource. This is a very nasty bug preventing the creation of new RDS replicas on AWS which could also cause breaks to your service configuration when running new RDS replica creation with other configuration changes. Until this is fixed, i would not suggest running any other type of configuration along with the creation of RDS replicas. The workaround I did was to import the tainted RDS replica resources into my state since the resources were created successfully in the console. Terraform please fix this!!

nijave · 2020-11-24T17:10:38Z

I have faced this error every time i try to create new replica instances; there has not even been a single time where this succeeds. I do notice everytime that the replica instance gets created successfully in the console but the terraform state still marks it as a tainted resource. This is a very nasty bug preventing the creation of new RDS replicas on AWS which could also cause breaks to your service configuration when running new RDS replica creation with other configuration changes. Until this is fixed, i would not suggest running any other type of configuration along with the creation of RDS replicas. The workaround I did was to import the tainted RDS replica resources into my state since the resources were created successfully in the console. Terraform please fix this!!

You're seeing this because on Amazon's side some of the configuration is done as separate operations but on the Terraform side it's all represented as a single object. I know enhanced monitoring works like this so I imagine people with more config spread across more API calls hit this more often

rgonzales5-chwy · 2020-11-24T17:14:43Z

I have faced this error every time i try to create new replica instances; there has not even been a single time where this succeeds. I do notice everytime that the replica instance gets created successfully in the console but the terraform state still marks it as a tainted resource. This is a very nasty bug preventing the creation of new RDS replicas on AWS which could also cause breaks to your service configuration when running new RDS replica creation with other configuration changes. Until this is fixed, i would not suggest running any other type of configuration along with the creation of RDS replicas. The workaround I did was to import the tainted RDS replica resources into my state since the resources were created successfully in the console. Terraform please fix this!!

You're seeing this because on Amazon's side some of the configuration is done as separate operations but on the Terraform side it's all represented as a single object. I know enhanced monitoring works like this so I imagine people with more config spread across more API calls hit this more often

no really, this issue happened to me while I attempted to create new replicas by themselves with no other resources, everytime

rgonzales5-chwy · 2020-11-24T17:16:55Z

I think the ca_cert_identifier is the cause of this. In govcloud, this value needs to be set to "rds-ca-2017". Each time the provider attempts to create a aws_rds_cluster_instance in govcloud, the apply fails with
Error: error rebooting DB Instance (xxxx-gov-dev-3): InvalidDBInstanceState: Instance cannot currently reboot due to an in-progress management operation.
	status code: 400, request id: 597cafa8-c9dd-4ee9-9678-9e3d6e11efd5
It is possible to get past this by untainting the resource and running the apply again

Reproduced on provider version 2.51.0

@roscoecairney what would be the explanation behind the ca_cert_identifier being the issue and how did you identify this. provide more details please

adv4000 · 2020-12-17T06:06:40Z

Strange behavior, also after removing ca_cert_identifier worked fine.

mgtrrz · 2021-02-23T18:55:36Z

Can confirm this issue exists even with the latest version (v.3.29.1 at the time of this comment) of the provider. Only occurs on creation of new read replicas with enhanced monitoring enabled. Like the others, we didn't need to specify ca_cert_identifier, so removing it fixed it for us.

AlonAvrahami · 2021-03-17T12:59:27Z

Any news on this issue? facing the same behavior.
im not using ca_cert_identifier, neither enhanced monitoring, but still facing this problem.

When running apply i get this error:
Error: error rebooting DB Instance (RDS_NAME): InvalidDBInstanceState: Instance cannot currently reboot due to an in-progress management operation.

And when running the plan again, it want to replace the resource:
# module.replica[0].module.db_instance.aws_db_instance.this[0] is tainted, so must be replaced

EDIT:
After updating the module to version 2.31.0 the problem solved (including removal of ca_cert_identifier, enhanced monitoring)

Please advice.

blongman-snapdocs · 2021-03-18T17:42:06Z

Still running into this. 0.12.28 for me. It would be nice if the provider had backoff/retry logic on this.

justinretzolk · 2021-11-18T19:49:17Z

Hey y'all 👋 Thank you for taking the time to file this issue, and for the ongoing discussion. Given that there's been a number of AWS provider releases since the last update, can anyone confirm whether you're still experiencing this behavior?

danhooper · 2021-11-19T20:06:57Z

We saw this error on version 3.64.1 just the other day.

bpazdziur · 2021-11-29T10:25:18Z

I have the same issue with provider version 3.67.0 and 3.37.0

I created a request regarding this issue to AWS Support center and I got following response:

Please refer to the below timeline with regard to your test of the instance named *** according to the attached log file.
2021-11-19 14:01:37 UTC DeleteDBInstance
2021-11-19 14:00:38 UTC RebootDBInstance >>Here, reboot command launched 32 seconds after the ModifyInstance with error 'cannot currently reboot due to an in-progress management operation.'
2021-11-19 14:00:06 UTC ModifyDBInstance
2021-11-19 13:53:26 UTC CreateDBInstance
2021-11-19 12:19:34 UTC DeleteDBInstance
2021-11-19 12:17:29 UTC RebootDBInstance >>>Here, reboot command launched 65 seconds after the Modify
2021-11-19 12:16:24 UTC ModifyDBInstance
2021-11-19 12:09:58 UTC CreateDBInstance
2021-11-19 11:29:25 UTC DeleteDBInstance
2021-11-19 11:27:13 UTC RebootDBInstance

As you can see, your first reboot was successful. The command was launched 65 seconds after the Modify.
The second reboot was failed. The command was launched 32 seconds after the Modify.

Possible reason --
If you modify an instance from the AWS console, you may notice this kind of phenomenon.
You submitted a modification, you refresh the page and watching the status of the instance.
You could see the status is still Available and becomes to Modifying after a few seconds.
This is how it works by design.

Suggestions to you --
The error message and the API log clearly demonstrated the instance status was not accepting the reboot.
I believe this is the reason without a doubt.
You can try to improve your code if possible with the below two factors

Simply add 60s sleep between your ModifyDBInstanc]e and RebootDBInstance
I guess your waitUntilDBInstanceAvailableAfterUpdate function has a loop to check the instance status every N seconds. You can consider to move to the next step by N times of successed check. (N>3)

github-actions · 2022-02-10T18:16:24Z

This functionality has been released in v4.0.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

github-actions · 2022-05-14T02:41:15Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

ghost added the service/rds Issues and PRs that pertain to the rds service. label Feb 5, 2020

github-actions bot added the needs-triage Waiting for first response or review from a maintainer. label Feb 5, 2020

ghost added service/ec2 Issues and PRs that pertain to the ec2 service. service/iam Issues and PRs that pertain to the iam service. labels Feb 5, 2020

nijave added a commit to nijave/terraform-provider-aws that referenced this issue Apr 27, 2020

Retry RDS reboot-in-progress errors

6606d99

Fixes hashicorp#11905

nijave mentioned this issue Apr 27, 2020

WIP Retry RDS reboot-in-progress errors #13042

Closed

nijave added a commit to Root-App/terraform-provider-aws that referenced this issue May 19, 2020

Retry RDS reboot-in-progress errors

0cedb8e

Fixes hashicorp#11905

poornima-krishnasamy mentioned this issue Jun 16, 2020

Adding replicate_source_db variable to support read_replica ministryofjustice/cloud-platform-terraform-rds-instance#62

Merged

nijave added a commit to Root-App/terraform-provider-aws that referenced this issue Aug 14, 2020

Retry RDS reboot-in-progress errors

e1a4f5d

Fixes hashicorp#11905

justinretzolk added waiting-response Maintainers are waiting on response from community or contributor. and removed needs-triage Waiting for first response or review from a maintainer. labels Nov 18, 2021

github-actions bot removed the waiting-response Maintainers are waiting on response from community or contributor. label Nov 19, 2021

justinretzolk added the bug Addresses a defect in current functionality. label Nov 19, 2021

gdavison self-assigned this Nov 30, 2021

gdavison mentioned this issue Dec 11, 2021

resource/aws_db_instance: Correctly handles update and reboot for replica instances #22178

Merged

gdavison closed this as completed in #22178 Feb 3, 2022

github-actions bot added this to the v4.0.0 milestone Feb 3, 2022

github-actions bot locked as resolved and limited conversation to collaborators May 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RDS InvalidDBInstanceState: Instance cannot currently reboot due to an in-progress management operation #11905

RDS InvalidDBInstanceState: Instance cannot currently reboot due to an in-progress management operation #11905

kbaldyga commented Feb 5, 2020 •

edited

Loading

roscoecairney commented Mar 2, 2020 •

edited

Loading

nijave commented Apr 27, 2020

rgonzales5-chwy commented Nov 24, 2020 •

edited

Loading

nijave commented Nov 24, 2020

rgonzales5-chwy commented Nov 24, 2020

rgonzales5-chwy commented Nov 24, 2020

adv4000 commented Dec 17, 2020

mgtrrz commented Feb 23, 2021

AlonAvrahami commented Mar 17, 2021 •

edited

Loading

blongman-snapdocs commented Mar 18, 2021

justinretzolk commented Nov 18, 2021

danhooper commented Nov 19, 2021

bpazdziur commented Nov 29, 2021 •

edited

Loading

github-actions bot commented Feb 10, 2022

github-actions bot commented May 14, 2022

RDS InvalidDBInstanceState: Instance cannot currently reboot due to an in-progress management operation #11905

RDS InvalidDBInstanceState: Instance cannot currently reboot due to an in-progress management operation #11905

Comments

kbaldyga commented Feb 5, 2020 • edited Loading

Community Note

Terraform Version

Affected Resource(s)

Terraform Configuration Files

Debug Output

Expected Behavior

Actual Behavior

roscoecairney commented Mar 2, 2020 • edited Loading

nijave commented Apr 27, 2020

rgonzales5-chwy commented Nov 24, 2020 • edited Loading

nijave commented Nov 24, 2020

rgonzales5-chwy commented Nov 24, 2020

rgonzales5-chwy commented Nov 24, 2020

adv4000 commented Dec 17, 2020

mgtrrz commented Feb 23, 2021

AlonAvrahami commented Mar 17, 2021 • edited Loading

blongman-snapdocs commented Mar 18, 2021

justinretzolk commented Nov 18, 2021

danhooper commented Nov 19, 2021

bpazdziur commented Nov 29, 2021 • edited Loading

github-actions bot commented Feb 10, 2022

github-actions bot commented May 14, 2022

kbaldyga commented Feb 5, 2020 •

edited

Loading

roscoecairney commented Mar 2, 2020 •

edited

Loading

rgonzales5-chwy commented Nov 24, 2020 •

edited

Loading

AlonAvrahami commented Mar 17, 2021 •

edited

Loading

bpazdziur commented Nov 29, 2021 •

edited

Loading