Skip to content

Commit daca70d

Browse files
committed
fix(toolkit): CLI tool fails on CloudFormation Throttling
The CDK (particularly, `cdk deploy`) might crash after getting throttled by CloudFormation, after the default configured 6 retries has been reached. This changes the retry configuration of the CloudFormation client (and only that one) to use a custom backoff function that will allow up to `100` retries to be made before failing (until it reaches an error that is either not declared as retryable; or that is not a throttling error); and will exponentially back-off (with a maximum wait time between two attempts of 1 minute). This should allow heavily parallel deployments on the same account and region to avoid getting killed by a throttle; but will reduce the responsiveness of the progress UI. Fixes #5637
1 parent 6166a70 commit daca70d

File tree

1 file changed

+39
-10
lines changed
  • packages/aws-cdk/lib/api/aws-auth

1 file changed

+39
-10
lines changed

packages/aws-cdk/lib/api/aws-auth/sdk.ts

+39-10
Original file line numberDiff line numberDiff line change
@@ -42,14 +42,7 @@ export class SDK implements ISDK {
4242
private readonly config: ConfigurationOptions;
4343

4444
/**
45-
* Default retry options for SDK clients
46-
*
47-
* Biggest bottleneck is CloudFormation, with a 1tps call rate. We want to be
48-
* a little more tenacious than the defaults, and with a little more breathing
49-
* room between calls (defaults are {retries=3, base=100}).
50-
*
51-
* I've left this running in a tight loop for an hour and the throttle errors
52-
* haven't escaped the retry mechanism.
45+
* Default retry options for SDK clients.
5346
*/
5447
private readonly retryOptions = { maxRetries: 6, retryDelayOptions: { base: 300 }};
5548

@@ -64,7 +57,43 @@ export class SDK implements ISDK {
6457
}
6558

6659
public cloudFormation(): AWS.CloudFormation {
67-
return wrapServiceErrorHandling(new AWS.CloudFormation(this.config));
60+
const defaultMaxRetries = this.retryOptions.maxRetries;
61+
const defaultRetryBase = this.retryOptions.retryDelayOptions.base;
62+
63+
return wrapServiceErrorHandling(new AWS.CloudFormation({
64+
...this.config,
65+
maxRetries: 100,
66+
retryDelayOptions: {
67+
customBackoff: (retryCount: number, err?: Error): number => {
68+
const { code, retryable } = (err as AWS.AWSError) ?? {};
69+
70+
// Note: "retryable" will be "undefined" if it wasn't an AWS.AWSError
71+
// and we still want to retry in this case. Hence the "===false"
72+
if (retryable === false) {
73+
return -1;
74+
}
75+
76+
const exponential = Math.pow(2, retryCount) * defaultRetryBase;
77+
const jitter = Math.random() * defaultRetryBase;
78+
79+
// We're capping this at 60 seconds.
80+
const backoff = Math.min(exponential + jitter, 60_000);
81+
82+
if (retryCount < defaultMaxRetries || code === 'Throttling') {
83+
if (code === 'Throttling' && backoff > 5_000) {
84+
// Drop a debug line if we'll be waiting more than 5 seconds before the next attempt
85+
debug(`CloudFormation Throttled: ${retryCount} retries so far, will retry in ${backoff} milliseconds`);
86+
}
87+
88+
// Exponential back-off per the default configuration
89+
return backoff;
90+
}
91+
92+
// No more retrying
93+
return -1;
94+
},
95+
},
96+
}));
6897
}
6998

7099
public ec2(): AWS.EC2 {
@@ -212,4 +241,4 @@ function allChainedExceptionMessages(e: Error | undefined) {
212241
e = (e as any).originalError;
213242
}
214243
return ret.join(': ');
215-
}
244+
}

0 commit comments

Comments
 (0)