Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(rds): secret rotation application times out before rotation completes #17265

Closed
asterikx opened this issue Nov 1, 2021 · 13 comments · Fixed by #17363
Closed

(rds): secret rotation application times out before rotation completes #17265

asterikx opened this issue Nov 1, 2021 · 13 comments · Fixed by #17363
Labels
@aws-cdk/aws-rds Related to Amazon Relational Database bug This issue is a bug. effort/medium Medium work item – several days of effort p2

Comments

@asterikx
Copy link
Contributor

asterikx commented Nov 1, 2021

What is the problem?

The secret rotation application times out before rotation completes and rotation fails.

Reproduction Steps

// create DB with generated master DB secret
const db = new rds.DatabaseInstance(this, 'Database', {
  engine: rds.DatabaseInstanceEngine.postgres({
    version: rds.PostgresEngineVersion.VER_13_4,
  }),
  instanceType: ec2.InstanceType.of(ec2.InstanceClass.BURSTABLE4_GRAVITON, ec2.InstanceSize.MICRO),
  vpc,
});
// add rotation to DB master secret
db.addRotationSingleUser({
  automaticallyAfter: cdk.Duration.days(30),
});

// create secondary DB secret (used by apps to connect to DB)
const dbUserSecret = new rds.DatabaseSecret(this, 'DbUserSecret', {
  username: 'app_user',
  masterSecret: dbAdminUserSecret,
});
// attach secondary DB secret
const dbUserSecretAttached = dbUserSecret.attach(db);
// add rotation to secondary DB secret
new secretsmanager.SecretRotation(this, 'DatabaseUserSecretRotation', {
  application: secretsmanager.SecretRotationApplication.POSTGRES_ROTATION_SINGLE_USER,
  secret: db.secret!,
  target: db,
  vpc,
  automaticallyAfter: cdk.Duration.days(30),
});

I log into the DB using the master user postgres and the generated password and create the secondary user app_user with password secret_passwd.
I'm now able to log into the DB using the user app_user with password secret_passwd.

Next, I trigger secret rotation for the secondary DB secret:

aws secretsmanager rotate-secret --secret-id <SECONDARY_DB_SECRET_ID>

Alternatively: under AWS Console > AWS Secrets Manager > Secrets > DbUserSecretXXXXXXXX-yyyyyyyyyyyy, press Rotate secret immediately

What did you expect to happen?

After triggering secret rotation, I can log into the DB using the newly generated password and the old password no longer works.

What actually happened?

Secret rotation is triggered successfully, however, I'm not able to login with the newly generated password. The old password secret_passwd still works.

Looking at the CloudWatch Logs of the secret rotation application, I can see that the function repeatedly times out:

2021-11-01T17:29:04.505+01:00	START RequestId: 428afaa9-e114-43b2-8054-b71239dfb8b5 Version: $LATEST
2021-11-01T17:29:04.770+01:00	[INFO] 2021-11-01T16:29:04.769Z 428afaa9-e114-43b2-8054-b71239dfb8b5 Found credentials in environment variables.
2021-11-01T17:34:04.611+01:00	2021-11-01T16:34:04.611Z 428afaa9-e114-43b2-8054-b71239dfb8b5 Task timed out after 300.10 seconds
2021-11-01T17:34:04.611+01:00	END RequestId: 428afaa9-e114-43b2-8054-b71239dfb8b5
2021-11-01T17:34:04.611+01:00	REPORT RequestId: 428afaa9-e114-43b2-8054-b71239dfb8b5 Duration: 300100.73 ms Billed Duration: 300000 ms Memory Size: 128 MB Max Memory Used: 69 MB Init Duration: 364.56 ms
2021-11-01T17:35:45.040+01:00	START RequestId: 32583674-0f79-4c74-8971-07f594f8bb65 Version: $LATEST
2021-11-01T17:35:45.274+01:00	[INFO] 2021-11-01T16:35:45.274Z 32583674-0f79-4c74-8971-07f594f8bb65 Found credentials in environment variables.
2021-11-01T17:40:45.147+01:00	2021-11-01T16:40:45.147Z 32583674-0f79-4c74-8971-07f594f8bb65 Task timed out after 300.10 seconds
2021-11-01T17:40:45.147+01:00	END RequestId: 32583674-0f79-4c74-8971-07f594f8bb65
2021-11-01T17:40:45.147+01:00	REPORT RequestId: 32583674-0f79-4c74-8971-07f594f8bb65 Duration: 300101.37 ms Billed Duration: 300000 ms Memory Size: 128 MB Max Memory Used: 31 MB

Triggering rotation again gives the following error:

An error occurred (InvalidRequestException) when calling the RotateSecret operation: A previous rotation isn't complete. That rotation will be reattempted.

CDK CLI Version

1.130.0

Framework Version

1.130.0

Node.js Version

v14.17.6

OS

macOS 12.0.1

Language

Typescript

Language Version

4.4.4

Other information

No response

@asterikx asterikx added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Nov 1, 2021
@github-actions github-actions bot added the @aws-cdk/aws-rds Related to Amazon Relational Database label Nov 1, 2021
@asterikx
Copy link
Contributor Author

asterikx commented Nov 1, 2021

Update: rotation still times out after canceling the previous rotation and setting the timeout of the Lambda function to 15 minutes:

2021-11-01T19:07:22.579+01:00	START RequestId: 5ca39262-6c13-4da4-8b43-d11588a99b87 Version: $LATEST
2021-11-01T19:07:22.861+01:00	[INFO] 2021-11-01T18:07:22.861Z 5ca39262-6c13-4da4-8b43-d11588a99b87 Found credentials in environment variables.
2021-11-01T19:22:22.686+01:00	END RequestId: 5ca39262-6c13-4da4-8b43-d11588a99b87
2021-11-01T19:22:22.686+01:00	REPORT RequestId: 5ca39262-6c13-4da4-8b43-d11588a99b87 Duration: 900100.65 ms Billed Duration: 900000 ms Memory Size: 128 MB Max Memory Used: 69 MB Init Duration: 370.81 ms
2021-11-01T19:22:22.686+01:00	2021-11-01T18:22:22.686Z 5ca39262-6c13-4da4-8b43-d11588a99b87 Task timed out after 900.10 seconds

@skinny85
Copy link
Contributor

skinny85 commented Nov 1, 2021

Hey @asterikx,

I was under the impression that the applications took care of the rotation, and that you shouldn't really trigger them yourself. Is that not your experience - has this worked before?

Thanks,
Adam

@skinny85 skinny85 added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Nov 1, 2021
@asterikx
Copy link
Contributor Author

asterikx commented Nov 1, 2021

Thanks Adam!

I'm using Aurora Serverless V1 (MySQL-compatible) since 1+ year with a single master user and weekly secret rotation. I just had a look at the CloudWatch logs, rotation failed every single time (5 attempts each time, with each attempt timing out after 30 secs). In fact, I also never had to change the password to connect to the DB during development ...

I'm currently migrating to RDS PostgreSQL. I created a secondary user/role, following this blog post on best practices. Since I created this role with the very unsecure password secure_passwd, I wanted to change the password through secret rotation.

I remember reading somewhere that secret rotation is triggered once immediately after deploying the secret rotation application. This matches what I've found, however, initial rotation did also time out so that the unsecure password secure_passwd is still in place.

I agree with you that I shouldn't really trigger the secret rotation myself, since the secret rotation application is supposed to do that for me ;)

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Nov 1, 2021
@jogold
Copy link
Contributor

jogold commented Nov 2, 2021

Hi @asterikx,

You should be able to trigger the rotation manually. The "application" deploys a Lambda function that can be triggered manually from the Secrets Manager console.

From your code I see that the the "secondary secret rotation" is acting on the master secret (db.secret!) and not the dbUserSecretAttached? This would mean that you have two rotation applications acting on the same secret (both single user scheme). Also it's not clear from your code what dbAdminUserSecret is (used in dbUserSecret).

Now for your second secret, you should follow the multi user rotation scheme, see It's also possible to create user credentials together with the instance/cluster and add rotation:

// create secondary DB secret (used by apps to connect to DB)
const dbUserSecret = new rds.DatabaseSecret(this, 'DbUserSecret', {
  username: 'app_user',
  masterSecret: db.secret, // the master secret of your DB
});
// attach secondary DB secret
const dbUserSecretAttached = dbUserSecret.attach(db);
instance.addRotationMultiUser('MyUser', { // Add rotation using the multi user scheme
  secret: dbUserSecretAttached,
});

@asterikx
Copy link
Contributor Author

asterikx commented Nov 2, 2021

Hi @jogold.

Sorry, I messed up when putting together the code for this issue. I used dbUserSecretAttached in my code.

Here's the complete code:

const dbAdminUserSecret = new rds.DatabaseSecret(this, 'DbAdminUserSecret', {
  username: 'postgres',
});
const dbUserSecret = new rds.DatabaseSecret(this, 'DbUserSecret', {
  username: 'app_user',
  masterSecret: dbAdminUserSecret,
});

const db = new rds.DatabaseInstance(this, 'Database', {
  engine: rds.DatabaseInstanceEngine.postgres({
    version: rds.PostgresEngineVersion.VER_13_4,
  }),
  instanceType: ec2.InstanceType.of(ec2.InstanceClass.BURSTABLE4_GRAVITON, ec2.InstanceSize.MICRO),
  credentials: rds.Credentials.fromSecret(dbAdminUserSecret),
  vpc,
  vpcSubnets: {
    subnetType: ec2.SubnetType.PUBLIC,
  },
});
db.addRotationSingleUser({
  automaticallyAfter: cdk.Duration.days(30),
});

const dbUserSecretAttached = dbUserSecret.attach(db);
new secretsmanager.SecretRotation(this, 'DatabaseUserSecretRotation', {
  application: secretsmanager.SecretRotationApplication.POSTGRES_ROTATION_SINGLE_USER,
  secret: dbUserSecretAttached,
  target: db,
  vpc,
  vpcSubnets: {
    subnetType: ec2.SubnetType.PUBLIC,
  },
  automaticallyAfter: cdk.Duration.days(30),
});

As for the the multi user scheme, I'm curious why I should use it over the single user scheme? Is it purely to prevent auth failures during secret rotation?

Either way, I see the single user rotation on the master secret failing for 1+ year of my Serverless cluster and also failing on the master secret of the RDS instance in this issue.

I'm happy to help, let me know how I can assist.

@jogold
Copy link
Contributor

jogold commented Nov 2, 2021

OK, I suspect the time out to come from connectivity issues between your Lambda and your DB or between your Lambda and internet.

Can you check the security groups for both the Lambda and the DB (those are normally created and configured for you by the CDK)? Can you also confirm that your Lambda has internet connectivity in the subnet where it is deployed? It needs to access the Secrets Manager API.

@asterikx
Copy link
Contributor Author

asterikx commented Nov 2, 2021

@jogold Thanks! The rotation Lambda indeed did not have internet connectivity!

From my findings, the rotation Lambda is put in the same subnet than the DB. Secret rotation only works if the target DB is located in a private subnet (ec2.SubnetType.PRIVATE_WITH_NAT).
It fails if the DB is located in an isolated subnet (ec2.SubnetType.PRIVATE_ISOLATED) or a public subnet (ec2.SubnetType.PUBLIC).

For isolated subnets, internet connectivity is blocked by design. For public subnets, internet connectivity is not possible as the ENI attached to a Lambda function does not have a public IP (packets are dropped at the IGW).

I think, the methods addRotationSingleUser and addRotationMultiUser should raise an error at synthesis time if the target DB is not placed in a private subnet OR put the rotation lambda in a private subnet (if any) OR let the user specify/override the VPC subnet placement.

What do you think?

@asterikx
Copy link
Contributor Author

asterikx commented Nov 3, 2021

I got secret rotation to work by explicitly creating the secret rotation applications and placing them into the private subnet of my VPC:

const dbAdminUserSecret = new rds.DatabaseSecret(this, 'DbAdminUserSecret', {
  username: 'postgres',
});
const dbUserSecret = new rds.DatabaseSecret(this, 'DbUserSecret', {
  username: 'app_user',
  masterSecret: dbAdminUserSecret,
});

const db = new rds.DatabaseInstance(this, 'Database', {
  engine: rds.DatabaseInstanceEngine.postgres({
    version: rds.PostgresEngineVersion.VER_13_4,
  }),
  credentials: rds.Credentials.fromSecret(dbAdminUserSecret),
  vpc,
  vpcSubnets: {
    subnetType: ec2.SubnetType.PUBLIC, // <-- database is put into PUBLIC subnet
  },
});

new secretsmanager.SecretRotation(this, 'DbAdminUserSecretRotation', {
  application: secretsmanager.SecretRotationApplication.POSTGRES_ROTATION_SINGLE_USER,
  // eslint-disable-next-line @typescript-eslint/no-non-null-assertion
  secret: db.secret!,
  target: db,
  vpc,
  vpcSubnets: {
    subnetType: ec2.SubnetType.PRIVATE_WITH_NAT, // <-- rotation function is put into PRIVATE subnet
  },
});

const dbUserSecretAttached = dbUserSecret.attach(db);
new secretsmanager.SecretRotation(this, 'DbUserSecretRotation', {
  application: secretsmanager.SecretRotationApplication.POSTGRES_ROTATION_MULTI_USER,
  secret: dbUserSecretAttached,
  masterSecret: db.secret,
  target: db,
  vpc,
  vpcSubnets: {
    subnetType: ec2.SubnetType.PRIVATE_WITH_NAT, // <-- rotation function is put into PRIVATE subnet
  },
});

Also: for rotation to succeed for the secondary secret, you have to create the secondary user in the DB with the password set to the Secrets Manager generated password of the secondary secret (otherwise the rotation function will not be able to login in the DB)

@jogold
Copy link
Contributor

jogold commented Nov 3, 2021

From my findings, the rotation Lambda is put in the same subnet than the DB. Secret rotation only works if the target DB is located in a private subnet (ec2.SubnetType.PRIVATE_WITH_NAT).
It fails if the DB is located in an isolated subnet (ec2.SubnetType.PRIVATE_ISOLATED) or a public subnet (ec2.SubnetType.PUBLIC).

@asterikx Actually you can deploy the rotation Lambda anywhere if you use a VPC endpoint for Secrets Manager.

To close this issue I suggest the following:

  • Expose vpcSubnets in addRotationSingleUser() and addRotationMultiUser()
  • Improve doc to clearly explain that the rotation Lambda needs either internet connectivity or a VPC endpoint for Secrets Manager

@skinny85 wdyt? I can work on the PR for this.

@asterikx
Copy link
Contributor Author

asterikx commented Nov 3, 2021

@jogold your proposal sounds good to me 👍🏻

Regarding the VPC endpoint for Secrets Manager, it might be useful mentioning pricing in the docs. Pricing was ultimately the reason why I decided to use a NAT instance (EC2 t4g instance).

@skinny85 skinny85 added effort/medium Medium work item – several days of effort and removed needs-triage This issue or PR still needs to be triaged. labels Nov 3, 2021
@skinny85
Copy link
Contributor

skinny85 commented Nov 3, 2021

@jogold sure, that would be amazing 🙂.

@skinny85 skinny85 removed their assignment Nov 3, 2021
jogold added a commit to jogold/aws-cdk that referenced this issue Nov 5, 2021
…et rotation

Add options to configure vpc subnet placement and Secrets Manager API
endpoint for the rotation Lambda function.

This is required in some VPC configurations where the database is placed
in subnets without internet connectivity.

Closes aws#17265
@mergify mergify bot closed this as completed in #17363 Nov 5, 2021
mergify bot pushed a commit that referenced this issue Nov 5, 2021
…et rotation (#17363)

Add options to configure vpc subnet placement and Secrets Manager API
endpoint for the rotation Lambda function.

This is required in some VPC configurations where the database is placed
in subnets without internet connectivity.

Closes #17265


----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@github-actions
Copy link

github-actions bot commented Nov 5, 2021

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

iliapolo pushed a commit that referenced this issue Nov 7, 2021
…et rotation (#17363)

Add options to configure vpc subnet placement and Secrets Manager API
endpoint for the rotation Lambda function.

This is required in some VPC configurations where the database is placed
in subnets without internet connectivity.

Closes #17265


----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
TikiTDO pushed a commit to TikiTDO/aws-cdk that referenced this issue Feb 21, 2022
…et rotation (aws#17363)

Add options to configure vpc subnet placement and Secrets Manager API
endpoint for the rotation Lambda function.

This is required in some VPC configurations where the database is placed
in subnets without internet connectivity.

Closes aws#17265


----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@arockett
Copy link

arockett commented Mar 4, 2022

There's a bug in VPC selection for addRotationMultiUser.

Even if you pass in props for the subnet type to place the rotation lambda, it will always get placed in the same subnets as the cluster. See https://github.com/aws/aws-cdk/blob/master/packages/%40aws-cdk/aws-rds/lib/cluster.ts line 611 for the addRotationMultiUser function. Notice how the '...options' line gets overridden by the cluster subnets.

The addRotationSingleUser function doesn't have this problem because the options are injected after the defaults. See the same file as above.

Created new issue to track: #19233

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-rds Related to Amazon Relational Database bug This issue is a bug. effort/medium Medium work item – several days of effort p2
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants